5) Building the Gold Layer

I originally titled this Dimensional Modelling, but I didn’t want the purists knocking on my door. This isn’t a full guide or a Kimball vs Inmon debate. The goal here is to connect the dots so our data is structured for reporting.

This post is about building the gold layer in our medallion architecture lakehouse. We’ll take the cleaned, curated tables from the silver and feature layers and organise them into a star schema that’s ready for analytics and visualisation.

The gold layer holds business-ready, aggregated data in dimensional models. In this case, we’re not designing measures or metrics inside the tables – we’ll keep those in Power BI as part of the semantic layer.

You can go back to the case study here.

Dimensional Modelling

Modelling is the step that turns all the heavy lifting in data engineering and ETL into something the business can actually use. Without it, you’ve just got a pile of tables. Dimensional modelling organises that complexity into datasets that make sense for reporting, machine learning and visualisation.

At the heart of this are fact tables and dimension tables. Facts hold the measurable events – gameweek stats, FPL points – while dimensions provide the context, like player or team. The power comes when dimensions are shared across multiple fact tables, giving you a consistent way to slice and analyse data across different processes.

For larger organisations, this often starts with an enterprise bus matrix. It’s basically a blueprint of business processes and the dimensions they share. Building this isn’t just a technical exercise – you need business stakeholders involved so the model reflects how the organisation actually works. (More on this on another blog).

Below is the bus matrix for this MVP project so far. It can expand if we want to add a new dimension or fact using the current silver data, but this will be easy to do instead of starting from scratch each time.

FPL Bus Matrix 1024x183

Dimension Tables

Dimension tables give context to your facts, and in our model they don’t all update at the same pace. We run three cadences to keep them up-to-date: static, seasonal, and weekly.

  • Static dimensions rarely change. For us, Date and Position won’t change once we have created them.
  • Seasonal dimensions shift with each new season. Here that’s Team and Season.
  • Weekly dimensions are the most dynamic. Players can transfer, fixtures move for TV slots, and gameweeks can shuffle. So Player, Fixtures, and Gameweek all need regular updates after every gameweek.

This approach keeps the model accurate without hammering the pipeline unnecessarily. We’ll dive into orchestration in the next post, but for now, the key takeaway is that dimensions aren’t one-size-fits-all – their update frequency depends on the business reality they represent.

Fact Tables

Fact tables capture the measurable events in your model, and in our case, they all update weekly. Here’s what we’ve got:

  • Player Gameweek Stats
    Every stat for every player, broken down by gameweek. This is the granular view of performance.
  • Player FPL Points
    For each player and fixture, how many Fantasy Premier League points they scored. This is the key metric for FPL analysis.
  • Team Gameweek Stats
    A roll-up of player stats to team level per fixture. Useful for team-level insights and comparisons.
  • Fixtures
    This one’s a bit unusual. It acts as both a dimension and a fact. It holds time, score, and fixture details, but in the star schema we don’t want to join dimensions to other dimensions. We may want to do analysis on fixtures as a fact (e.g. time between games), so we keep a fact version for measures and a dimension version for descriptive context (team names instead of just keys).
  • FPL Point Prediction Performance
    Predicted points versus actual points per player per fixture. This is where we measure the accuracy of our models.

These tables give us the backbone for analytics and predictions. They’re all about capturing the events – matches, player performance, and outcomes – in a way that can be aggregated, sliced, and compared.

Now we have all the modular pieces built from ingestion to gold layer, the next step is to orchestrate these in a pipeline. See you next time!

BW

Next Time – Orchestration

Leave a Comment

Your email address will not be published. Required fields are marked *