Back to all perspectives

Blog

What Does a Good Data Platform Look Like?

Blog by Robbie Shaw

When we talk about solving data inefficiency and setting up for long term success, the central theme is this: not all data platforms are equal. We see many organizations that are investing in a platform that isn’t meeting their needs. The signs are usually clear:

  • Every new data or AI project requires recreating data flows from scratch.
  • Bringing in a new source involves substantial red tape and long IT requests.
  • People spending time on manual task to keep the platform running.
  • Business users wrangling data in Excel rather than analyzing it.
  • Unnecessary lag between source data being available and the business receiving it.

If any of that sounds familiar, the issue is usually the way the platform has been designed.

Asking the right questions

Before building or reworking a data platform, there are a few things you need to think through.

First, consider the wider potential of the platform, not just the current requirement. You may only need sales and finance data today, but what about supply chain, HR and manufacturing data tomorrow? What about new use cases for the same data?

Second, think about cost and performance now and in the future. You need to manage costs today but also design in a way that can scale and deliver information quickly as demand grows.

Third, consider how AI fits in. That means your tables and field names need to make sense beyond your immediate team. If an AI model is going to query your data, will it understand what each field represents? Can your platform handle unstructured data as well as structured rows and columns?

You also need clarity around business engagement and ownership. Who holds the knowledge? How is that knowledge shared with the teams implementing the logic? And how easy is it for someone new to join and understand how the platform works?

Simplifying logic

With that foundation in mind, what does a good data platform look like in practice?

One key principle is concentrating transformation logic. In a high-level architecture, you ingest data from source, curate and transform it, and then serve it up to reporting tools. In poorly designed platforms, logic is spread across all these layers. Some transformations happen during ingestion, some in the data lake, some in the semantic layer. That makes maintenance difficult. It requires multiple skill sets and a deep understanding of every layer just to fix a single issue.

We believe the right way is to concentrate transformation logic into a single, well-defined layer wherever possible. This reduces the number of tools involved, lowers the learning curve for the team, and makes support and further development far easier. If there is a data quality issue, you know where to look. If you are onboarding new data, you know where the logic belongs.

We have seen the impact of this approach across industries, from insurance to aerospace manufacturing. Simplicity in design directly improves maintainability and long-term success.

A metadata-driven pipeline

A second principle is building a metadata driven data platform. Rather than creating a new pipeline for every new source, we set up as few pipelines as possible, ideally one highly parameterized pipeline. The specifics of each data source, such as format, location, transformation rules and data quality checks, are stored in metadata tables.

When you onboard a new source, you add a new row of metadata. The same pipeline reads those parameters and processes the data accordingly. We implemented this approach for an aerospace manufacturer using Microsoft Fabric and for an insurance company using Databricks. The tooling differed. The framework did not.

The result is a scalable and flexible platform. Data quality checks, logging, notifications and transformations are all controlled through metadata. It becomes far easier to bring in new data, handle new quality issues and extend the platform without duplicating effort.

What good looks like

A good data platform is designed for future success, not just today’s requirement. It concentrates logic. It uses metadata to drive scale and flexibility. It is easy to maintain and extend. And it is built through close collaboration between business and IT. Getting these foundations right means the analysis you need today is easier to get and that you’re set up for success in everything that follows, including AI.

If you would like to learn more about how we design efficient data platforms for our customers, you can watch our recent webcast on this topic here.

Find out more

Contact Robbie Shaw. Robbie is a Data & AI Consultant at Thorogood based in Singapore.