An Introduction to Data Modeling in Power BI
A tried and tested starting step to understanding data modeling in Power BI is to realize that a typical Excel spreadsheet is a data model in itself. In simpler terminologies, it is a visual representation of individual data points, commonly shared categories, grouped together.
Power BI allows the users to work at the granularity or level of the data points themselves, making the job easier. If an excel spreadsheet offers a snapshot or an image to a user, Power BI enables them to access and analyze the pixels of that image as well, the pixels being synonymous to the data points and the image being the data model.
Think about each of the individual attributes involved in a typical sales transaction. A client buys a certain amount of a specific product at a given price from a particular point of sale on a specified date at a said time. Together, the combination of each of these parameters represents a unique transaction. Yet each of these parameters has its own attributes. The client would have a particular ID number, name, location, and profession. The product would have a certain SKU, product group, and cost to manufacture. The point of sale would have attributes like store ID, city, and region. A single sale transaction would involve all these attributes, in addition to the quantitative measures like price point, quantity, date, etc.
A few key terms and concepts:
- Attribute: A descriptive data label that resides within a dimension table
- Dimension Table: A grouping of related attributes (e.g. customers, products). A dimension table is also called Reference Table at times.
- Measure: Quantitative, numeric data that can be aggregated for analysis (e.g., population, orders)
- Fact Table: A grouping of measures (e.g., Sales, Exam Results)
- Database Normalization: The process of structuring a relational database with the aim to reduce data redundancy and improve data integrity.
- Key Column: A common column present in both a dimension table and a fact table which is used to link these tables and establish a relationship.
- One-to-Many Relationship: Occurs when each record in Table A may have many linked records in Table B, but each record in Table B may have only one corresponding record in Table A.
- Star Schema: A data model featuring a singular large fact table storing transactional or measured data, and one or more smaller dimensional tables storing attributes of the data.
In place of an unwieldy spreadsheet to account for each of these distinct and independent parameters, Power BI makes use of normalized relational data modeling. Instead of visualizing our hypothetical transaction as dataset of its own, Power BI represents it as a combination of parameters from different datasets. By assigning unique identifying keys or IDs to each parameter and grouping together within a set — Product, Customer, Store — Power BI enables the users to analyze the spectrum of data with a clearer and broader perspective.
A dataset stored in a typical Excel spreadsheet is bound to be filled with redundancies which would require immense cleaning and processing time and manual effort to group, filter and visualize. Power BI extends the functionality to structure and store the data in a format that allows for robust models to be developed, refreshed, and reused for multiple different reports, simultaneously. To conclude, Power BI allows the users to slice, dice and combine data points enabling them to achieve the deepest possible insights into the dataset.
To find out more about introducing Power BI to your organization, or for support in taking the next step in your Power BI journey, please reach out to Alaistair Jones.