Summary
This project used Java, JSCH, Apache Storm, HTML, and Zookeeper
Performed distributed data streaming and analysis of the flight data to measure the traffic in airports in the US using Storm framework along with Zookeeper coordination service for cluster management.
As an example dataset, we will use the Open Air Traffic Data for Research website (at https://opensky-network.org/) and count the number of flights departing from or arriving at each US major airport per airline company.
As per my observation, the fastest execution time was when I used a single thread which was 12146ms and as I increased the number of threads, the execution time increased. But there was no significant change in the execution time.
Limitations:It was seen that there was no real-time update about the execution of the topology on the storm application.Scalability: The current implementation is designed to run on a local Storm cluster and processes a single input file. It may not be able to handle large volumes of data or scale up to distributed clusters.
Future improvements:Increase the parallelism of the topology.Implement efficient data structures for sorting and storing data