Predicting Fantasy Premier League

What’s the Plan?

I want to build a flagship personal project where I can experiment with new features, data sources and techniques.

This project will:

  • Generate content worth sharing
  • Showcase my technical capabilities and thought-process
  • Re-familiarise myself with Databricks after two years in Fabric
  • Provide inspiration for public talks and demos
  • (Hopefully) improve my FPL rank!

What is Fantasy Premier League?

Fantasy Premier League (FPL) is a game where millions of people try to pick the best-performing football team each week. Points are scored for goals, assists, clean sheets, and deducted for cards, conceding goals and missing penalties.

It’s a perfect playground for data engineering and machine learning – tons of stats, constant updates, and a clear outcome to optimise for: points.

I’ll be building a pipeline that ingests FPL data, models it, and ultimately predicts the best team to pick using Machine Learning. The project will be modular, with the majority built in PySpark, using Databricks Free Edition and stored in Delta Lake. I’ll host everything in GitHub, with the aim of eventually porting it into a paid Databricks instance and/or Fabric to experiment with further features.

Project Structure