ABOUT the PROJECT

Uber’s API — which allows users to view fare estimates, pickup wait times and more for a given location — opened to developers in August 2014.

As college students who use Uber frequently to travel between College Park and Washington D.C., we were already familiar with the ride-sharing service’s surge multiplier. A $30 trip could end up being more than $100 if surge pricing was in effect.

Like other users, we felt in the dark about why prices surged and how often they did, even with an explanation from the company.

We turned to the API for some answers.

We focused our research on the nation’s capital, where taking an Uber X at a regular fare is often cheaper than taking a taxi, based on a 5-mile, 10-minute ride.

We selected coordinates from four locations in Northwest Washington, D.C. ( Adams Morgan, George Washington University, K Street and U Street) and one in Southwest Washington, D.C. (Navy Yard). These locations were diverse enough to reveal a variety of interesting results:

Methodology

We began by gathering data from the Uber API every 15 seconds for all four types of Uber vehicles (Uber X, XL, Black and SUV) at five different locations. For our second-by-second graph, we chose data from a Friday to get a more in-depth understanding of how quickly the surge multiplier changed throughout a day.

In order to collect data from these locations, we had to locate their longitudes and latitude points. After we built the database using MySQL to store the data, we called the Uber API with each of the given start locations to retrieve information on the estimated wait time and the surge multiplier at that time.

We collected a week’s worth of data, and we ran a query that brought back all of our data points for one week and all products at 15-second intervals in MySQL. This brought back more than 2 million rows of data for us to analyze. We used a query specific to each product and location and extracted this data from MySQL into Microsoft Excel.

In Microsoft Excel, we concatenated the data to have the timestamp and surge multiplier of each point. At this point, we realized that some of the data points were not separated by 15 seconds. An anomaly in the data collection caused sporadic gaps of 30, 45, 90, 300 and even 14,700 seconds. Our data processing allowed the time stamp and surge multiplier to show gaps in certain graphs to account for the missing data.

These 141 graphs were created using HighCharts, and the webpage was designed using Bootstrap.

About us

Jenny Hottle and Christine Rice are senior journalism majors in the Philip Merrill College of Journalism at the University of Maryland and data reporters in the college's Capital News Service bureau in College Park.