Uber is sharing anonymized data with Boston policymakers.

The data will provide new insights to help manage urban growth, relieve traffic congestion, expand public transportation, and reduce greenhouse gas emissions.

This is an interesting dataset for an urban planner. And Uber did well to anonymize to ZCTA instead of giving individual addresses.1 NYC made a mess of things in June 2014 when trying to do something similar.

But using Uber’s data to actually make decisions is ludicrous. Urban planning, like most policymaking, is about how to best distribute scarce resources. It is political. Look at Uber’s example uses:

Uber’s transportation policy

If Boston actually uses Uber’s data to decide which potholes to fill first, they are not going to fill many potholes in poor neighborhoods. If Boston uses Uber’s data to add additional metro stops, they will not add metro stops in poor neighborhoods.

The first questions any data analyst should ask with a new dataset are:

  1. How was this data collected?
  2. What are its blind spots?

And the blind spots here are both large and systematic. Note, however, Uber’s vacuous language above: this data will provide “insights”. They are too smart to say explicitly “this is the only data source you should use for your city planning.” But without a similarly rich dataset on the whole city, the data will only provide “insights” about how to help rich folks.

And, by the way, be wary of people peddling insights. If an insight held up to rigorous analysis, we would just call it a conclusion.

What do you think about Uber’s influence on urban policy?

  1. Though some interesting data is lost here. For example, is the rider’s destination on a busy avenue (to a store/restaurant) or a residential street (personal visit)?