As a talented data scientist for the New York City Department of Transportation (DOT), reducing traffic congestion is a top priority. Chronic traffic delays not only frustrate commuters but also negatively impact the city's economy and air quality. You have an opportunity to leverage data to optimize traffic flow and improve the overall transportation system in New York City, which is a huge problem. The government has pledged resources to address your solutions.
You have access to a comprehensive dataset of taxi trip data, including pickup and dropoff locations, timestamps, and various trip attributes. Your task is to build a predictive model that accurately estimates the duration of taxi trips based on these factors. By analyzing historical patterns and incorporating real-time traffic conditions, you can provide valuable insights to the DOT for optimizing traffic signal timings, identifying congestion hotspots, and implementing targeted traffic management strategies. Your work will directly contribute to a more efficient, sustainable, and livable urban environment for millions of New Yorkers.
Goal: The goal of this project is to build a model that predicts the total ride duration of taxi trips in New York City.
The dataset contains information about taxi trips in New York City, including pickup and dropoff locations, timestamps, and passenger count. It contains real trip data that was sampled and cleaned, and based on individual trip attributes should predict the duration of each trip in the test set
id: a unique identifier for each tripvendor_id: a code indicating the provider associated with the trip recordpickup_datetime: date and time when the meter was engageddropoff_datetime: date and time when the meter was disengagedpassenger_count: the number of passengers in the vehicle (driver entered value)pickup_longitude: the longitude where the meter was engagedpickup_latitude: the latitude where the meter was engageddropoff_longitude: the longitude where the meter was disengageddropoff_latitude: the latitude where the meter was disengagedstore_and_fwd_flag: This flag indicates whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server - Y=store and forward; N=not a store and forward triptrip_duration: duration of the trip in seconds (target variable)Data Source: Kaggle NYC Taxi Trip Duration
Download DataYour task is to build a regression model to predict the duration of taxi trips in New York City.
Python Libraries: Pandas, NumPy, scikit-learn, Matplotlib/Seaborn.