Blackjack Analytics Pipeline

Cloud blackjack analytics pipeline

Year
2025
Type of Project
Big data processing and BI/ML integration.
My Role
Data Engineer & Analytics Developer

Case Study

Objective

Build a production-style, cloud-based pipeline that ingests large-scale blackjack hand data, performs cleaning and feature engineering with PySpark, runs analytical queries, and produces datasets for machine learning models and interactive dashboards. Enable downstream teams or tools to derive insights such as win/bust rates, player performance, and outcome prediction from 900,000+ blackjack hands.

https://github.com/ujjwalredd/Blackjack-Analytics-Pipeline

Process

Ingested raw blackjack hands dataset from Amazon S3 into a PySpark environment running on an EC2 instance, configured with IAM roles for secure S3 access.
Performed data cleaning, dropped unnecessary columns, and engineered features such as player and dealer total card counts, then wrote processed outputs back to structured S3 folders in CSV format.
Ran Spark SQL and PySpark aggregations to compute game statistics (win/bust rates, blackjack frequency, player performance, game patterns) and exported results for BI tools.
Trained predictive models in SageMaker Canvas on the processed data and built a Power BI dashboard using the curated CSV outputs from S3.

Outcome

Generated detailed game insights including player vs dealer win rates, bust rates, blackjack frequency, and most common final totals (e.g., final total of 20 as the most frequent outcome).
Quantified strategy effects, such as declining win rate as players draw more cards, and identified top-performing players by total winnings (e.g., Player6 with the highest earnings).
Delivered reusable, BI-friendly processed datasets and visual dashboards summarizing game statistics, patterns, and ML model outcomes.

Standout Features

End-to-end AWS integration
Scalable big-data processing
Actionable game insights & prediction