Flight Data Analysis

Summary

  • This project used Java, JSCH, Apache Storm, HTML, and Zookeeper

  • Performed distributed data streaming and analysis of the flight data to measure the traffic in airports in the US using Storm framework along with Zookeeper coordination service for cluster management.

  • As an example dataset, we will use the Open Air Traffic Data for Research website (at https://opensky-network.org/) and count the number of flights departing from or arriving at each US major airport per airline company.

  • As per my observation, the fastest execution time was when I used a single thread which was 12146ms and as I increased the number of threads, the execution time increased. But there was no significant change in the execution time.

  • Limitations:It was seen that there was no real-time update about the execution of the topology on the storm application.Scalability: The current implementation is designed to run on a local Storm cluster and processes a single input file. It may not be able to handle large volumes of data or scale up to distributed clusters.

  • Future improvements:Increase the parallelism of the topology.Implement efficient data structures for sorting and storing data