
Real-time Data Processing and Insights from Seoul Bus Data
2023-04-03
KafkaDruidSupersetMySQLReact
KEATeam
GitHub Link
https://github.com/KEA-ACCELER/kafka-druid-superset
📢 Presentation Video
📖 Presentation Material
⭐️ Project Overview
This project involved building a system that collects, processes, analyzes, and visualizes real-time bus boarding and alighting data and bus stop data using kafka, kafka streams, druid, and superset. The system was constructed through the following steps:

- Firstly, using kafka, a message bus was set up to collect and deliver real-time bus boarding, alighting, and bus stop data. Kafka is known for its high performance and scalability and can integrate with various data sources.
- Next, kafka streams was utilized to process the streaming data related to bus boarding, alighting, and bus stops. Kafka streams is a library that allows easy processing of data from kafka, enabling the implementation of complex business logic. For example, it can calculate and deliver real-time statistics on passenger counts, boarding ratios per bus stop, and bus operating status.
- Subsequently, druid was employed to create a real-time analytical database for the bus boarding, alighting, and bus stop data. Druid is an open-source, high-performance database specifically designed for real-time analytics, capable of querying and aggregating large volumes of data quickly. Druid can ingest and index data from kafka in real-time and provides various aggregation functions and filtering capabilities.
- Finally, superset was used to build a BI platform that visualizes bus boarding, alighting, and bus stop data in various charts and dashboards. Superset is an open-source BI platform that integrates with druid to easily visualize data. It offers a user-friendly interface with diverse chart options and allows the creation of real-time updating dashboards.

👬 팀 구성
팀원 (5)
👬 Team Composition
Team Members: 5
🔨 Responsibilities
- Development of Front-End Page with React
- Structuring the Model Architecture
- Creating Insights from data
- Producing SQL Queries for Insights
- Analysing the Insights
⚒️ Technologies and Libraries Used
💡 Reflections
- Through this system, we were able to gain real-time insights into bus boarding, alighting, and bus stop data, contributing to efficient traffic management and service improvement. The project provided deep knowledge and experience in real-time data analysis and visualization. In the future, we plan to further enhance and develop the system created in this project.
- The process of receiving large amounts of data in real-time, visualizing it by topic, and gaining data insights was intriguing and novel. It sparked my interest in using this technology for even larger datasets and services.