The Plan is Simple
I am a bang average FPL player. Below are my scores and global ranks for previous seasons, and my current team (at time of writing 2 of my players have just been reported injured!).
I’m not terrible, usually finishing top half of most of the leagues I am in with friends, and I don’t forget about it or give up halfway through the season like a lot of people do.
A lot of it is luck – who you back at the start of the season to make a big impact or whether a player gains/loses form or gets injured can seriously affect your plans.
The volume of stats available is massive, with actuals such as goals and assists, but also the ‘expected goals’ (xG) and ‘expected assists’ (xA) calculated using the models created by the boffins at Opta. This can show us whether players are playing well even if the rewards aren’t necessarily there. I would rather trust a player to consistently earn points over the long term if they are currently on 2 goals with 4 xG vs 4 goals with 2 xG (it indicates severe under/overperformance against how they are actually playing).
I want to use data to improve my FPL rank and climb the league leaderboards. It might be too late for me this year to storm the global top 10k, but by 26/27 I’ll be ready!
If nothing else, it’s a fun project to build and keep learning and improving my skills with.
You can go back to the case study here.


Technology
I’ve outlined above the ‘Why?’ of the project – the most important part in every project. Another key driver for doing this is to learn and enhance my skills across the full data lifecycle, and to use techniques and tools that I haven’t been able to in a while.
I have worked in industry using Microsoft Fabric for the past 2 years, which has been amazing being part of projects where the platform is growing and improving daily, but there have been incredible advancements in Databricks and ML since I last worked with them properly. I want to sharpen back up with these before I take on new challenges and contracts.
The fundamentals of data architecture, engineering, machine learning or visualisation haven’t changed. There are just some new tools, UIs, frameworks and techniques that I would like to explore. As with all projects, I’m sure things will change as I uncover some gremlins in the data, or want to shift if I discover a new method of doing something I hadn’t thought about. This is just a rough plan to outline my thought process before I dive into the doing.
I also want to build the code/pipelines in a way that I could replicate the whole project in solely Databricks or Fabric in the future, as well as demonstrate a hybrid approach like I plan to here. I prefer the orchestration in Fabric/ADF, and Power BI is a superior offering than the visualisation tools Databricks currently have. That might be because I am more familiar, or other tools may come along that I want to shift to, so I don’t want to lock this project into one vendor or way of doing something.
The key things I want to include:
- Databricks – Free Edition for now (I am starting a business and need to look after the pennies!)
- Machine Learning – Training a model using features and then using it to predict points given the next round of fixtures
- Building Modular functions to be used in Python wheels or User Defined Functions
- Shortcutting/mirroring from Databricks to Fabric
- Implementing a robust CI/CD process with automated deployments between environments
- Using Tabular Editor to build Power BI reports and projects
Here is how I am currently envisaging the flow of data. I haven’t made many ‘architecture’ diagrams – I would love some advice on how to improve, what I am missing etc.! Message me on Linkedin with some pointers. (BTW, Excalidraw is pretty cool)

Getting the Data
I have to give a couple of legends a shout-out here.
The FPL API is really easy to use, and has loads of data in it. The problems are 1) There is no clear, up-to-date documentation and 2) The API only contains the current season’s data.
So I was buzzing to find Frenzel Timothy’s blog explaining all the data for each of the different endpoints, and Vaastav Anaand’s Github repo containing gameweek stats since the 2016/17 season to get me started!
I can copy the CSVs of historic data to populate up to my starting point (GW 7 25/26), and from then on use the API to grab the data after each gameweek.
I think we are set up to start building!
BW
