U.S. Flight Delay Analysis (2019-2022)
This project analyzes U.S. flight delay patterns from 2019 to 2022, aiming to predict delays based on key factors like weather, airline, and airport performance (pre and post-pandemic) using Python, Pandas, Matplotlib, and Tableau.
Data:
Bureau of Transportation Statistics
Techniques Applied:
Sourcing Open Data, Exploratory Data Analysis, Hypothesis formation, Linear Regression, Supervised Machine Learning, Unsupervised Machine Learning, Data Visualization and Reporting via Strategic Dashboard
Overview:
Open-sourcing flight data from 2019-2022 enables a comprehensive exploratory data analysis to assess the pandemic’s impact on U.S. Flights. Linear regression is used to identify trends and relationships, while k-means clustering helps uncover patterns and anomalies in the data. A strategic dashboard is prepared for stakeholders, offering clear and actionable insights.
1. Data Collection and Preprocessing
The dataset includes over 33 million flights with details such as scheduled departure/arrival times, delay durations, weather conditions, and airline codes. After integrity checks, the data was cleaned and prepared for exploratory analysis.
2. Exploratory Data Analysis Key Insights
Carrier-Related Delays and Cancellations
As shown in the Flight Number Group Delay table, carrier-related delays have increased steadily since 2020, particularly in flight groups with shorter distances. The pandemic exacerbated operational inefficiencies and staffing shortages, making carrier-related disruptions the most significant cause of delays and cancellations. The % of Delay Types by Year chart highlights that carrier-related delays reached their peak in 2022, surpassing other causes such as late aircraft and weather.Impact of Pandemic on Flight Volume
The Yearly Flight Volume line chart demonstrates that, by 2022, U.S. domestic flight volume had not returned to pre-pandemic levels (2019). This underscores the lasting economic and operational burden placed on the airline industry post-pandemic.Airports with Highest Delays
Larger airports like Chicago O'Hare and Atlanta continue to experience higher delays due to congestion, as highlighted in the Distance of Flight Number Groups visualization, which shows delays across varying flight distances. Delays are common across both short and long-haul flights, indicating that operational challenges exist regardless of the flight’s length.
3. Key Findings and Recommendations
Improving Carrier Efficiency:
Airlines should prioritize operational optimization, focusing on crew scheduling, staffing improvements, and preventing bottlenecks during peak travel periods.Implement Preventative Maintenance:
Enhanced maintenance protocols should be introduced, particularly for older aircraft fleets, to avoid mechanical failures and minimize disruptions.Dynamic Resource Allocation:
Resources should be dynamically adjusted during peak travel periods, ensuring that airports prone to delays are adequately staffed and equipped to handle surges in passenger traffic.Expand Customer Rebooking Options:
Offering automated rebooking for passengers affected by delays or cancellations can improve the customer experience and alleviate congestion.Proactive Passenger Communication:
Real-time delay notifications and enhanced digital tools should be leveraged to keep passengers informed during disruptions, reducing frustration and improving satisfaction.Collaborate with Airports:
Stronger partnerships between airlines and airports can improve management of runway allocation, traffic flow, and ground operations to reduce delays during peak hours.
This analysis demonstrates the growing impact of carrier-related delays and the broader influence of pandemic recovery on U.S. flight operations, offering actionable steps to improve efficiency, reduce disruptions, and enhance the passenger experience.