Originally posted: 2020-11-07. View source code for this page here.

The phenomenal success of data driven businesses like Google and Amazon has led to increased recognition of the value of data, and a corresponding desire for new investment.

As a result, seemingly every big organisation has a new data platform that’s always just around the corner — a platform which will finally impose order on the organisation’s data and allow it to realise its data driven ambitions. This vision of a step change in data capability is closely linked to a ‘command and control’ style of data leadership whereby solutions are designed and agreed by a small group of experts.

These platforms - if they arrive at all - never seem to deliver the step change that is promised. Why is this such a common problem, and how can organisations realise more value from their data without falling into the ‘big bang platform’ trap?

A root cause of later problems is the tension between an appealing narrative and a deliverable plan. The politics of organisations tend to push towards an ambitious vision and a compelling narrative - and since senior decision makers rarely have deep experience of delivering data platforms, they can struggle to offer effective scrutiny.

The narrative usually centres around fixing the data mess once and for all by creating a new, beautifully curated data platform that contains a single version of truth for all the organisation’s data. I think this is a compelling story because it appeals to a basic human instinct for order - capitalising on the same feeling we get when browsing the Ikea catalogue.

Unfortunately, this vision is usually undeliverable.

The ultimate goal is not to deliver a platform but to enable the organisation to derive more value from its data. Too much weight is often placed on the role of the platform in unlocking value, and this distracts from detailed scrutiny of how value will be realised. For example, if operational efficiencies are expected, how much are these worth, and how does the platform help? Could these benefits be delivered with current infrastructure? It’s too often assumed that a better data platform will inevitably result in business value without enough focus on specific business problems and quantifiable benefits.

An example that illustrates the problem is the idea of the single version of the truth, which is often a key selling point of a new platform, and a big part of the ‘cleaning up the mess’ narrative. The promise is that this will reduce complexity and duplication of work and eliminate inconsistency. The narrative is compelling because it’s usually easy to find examples of different parts of the organisation using mutually inconsistent data.

Whilst opportunities for consolidation usually do exist, the benefits of the single version of truth are usually oversold, and many of the blockers are not due to the lack of a new platform.

There are legitimate reasons for holding different measures of the same thing. A high profile example is the challenge of defining the number of deaths from COVID: various different definitions may be appropriate depending on the use of the statistics.

These competing user needs create tensions between the the vision and the delivery of value. The vision promises to deliver simplicity, but this simplicity can only come at the cost of reducing the value of data to some customers.

The vision of the ‘big bang’ platform implies some type of top-down design - a style of data leadership in which a small group of experts decide what’s best. This could include decisions about what data to store, how to organise the data, how things should be measured and the tools available to users.

There may be wide consultation, but the assumption is that a few common solutions and patterns are used by everyone. This simplification is an important part of the vision.

This fails to account for the complexity of real-world data. Deep expertise about the organisation’s data generally sits relatively low down the hierarchy, with many details not written down. It can take many months of working with a data source to understand its thorniest challenges.

This information is not something that can be distilled into a few interviews; and more generally the overwhelming complexity of big organisations’ data makes it extremely difficult to corral into a single new architecture.

Faced with the high complexity of existing systems, it's also too easy to assume that this is the result of a lack of expertise or tooling. Whilst this may be part of the story, the reality is a lot more nuanced, involving issues like underlying data quality, user needs, staff skills, or organisational culture that a new platform cannot usually solve.

As a result, whilst a top down design may superficially accommodate each dataset, it is unlikely to meet real-world user needs. Information simply cannot flow fast enough between planners and implementers to make it work - and attempts to gather enough information can result in suffocating governance requirements.

If centrally designed platforms and a command-and-control style of data leadership don’t work, what does?

In a nutshell, the recipe for success is to have small delivery teams working on specific problems with well-defined business value. Data leaders should explicitly recognise that the organisation’s data problems are too complex for a small group of people to understand and solve. Teams should therefore be empowered by a flexible range of tools, and are given the freedom to find innovative solutions themselves. This provides a sustainable model for the delivery of real business value and continuous improvement.

There is still a huge role for data leaders in this delivery model, but the focus is more about creating the right conditions for success rather than designing solutions. The key elements are as follows:

  • **Understanding value and prioritising work. **This means managing the portfolio of work to make sure it’s focussed on areas that have tangible business value and have a realistic prospect of success. It also means being explicit about the tradeoffs between delivery speed and perfection and defending these choices to stakeholders.

  • **Empowering teams to solve problems. ** A data platform is important, but its role should be to give flexibility to users and remove barriers to data flow, not to provide cookie-cutter solutions to all problems. The platform should empower users to try a range of different tools and approaches to help them find the best fit for their problem. This is a critical driver of continuous innovation in a world where data tooling is rapidly evolving.

  • **Coordinate teams and drive adoption of good ideas. **Whilst individual teams should be given considerable flexibility, there is an important role for data leadership in coordinating teams and promoting information flow between them. This is a delicate balance. It involves encouraging ground rules and principles to emerge, and promoting (and occasionally enforcing) best practice, but falls short of designing and imposing solutions. Solutions themselves will generally originate in delivery teams, with the role of data leaders to recognise quality and encourage adoption amongst other teams.

    The purpose of ground rules is to encourage transparency and re-use without significantly harming flexibility. For example, one ground rule may be that there should be no data without metadata, and that metadata should be held in a consistent, open-source, machine readable format. This enables coordination between teams whilst imposing very little constraint on how individual teams solve their problems.

    Another principle may be of transparency and reproducibility, with a ground rule that all code should be stored in the same version control system. This allows teams to easily see what each other are doing.

    If this approach is successful, it should be very rare to need to force teams to use particular tools or implement principles. This should only happen after significant reflection on why adoption has not occurred. Enforcement should only be used where there is ample proof of a tool or rule working in similar contexts, and a clear explanation for why the existing approach is harmful.

Having experienced both styles of leadership, I have been able to observe the difference in outcomes. I am lucky to currently work on a delivery team where we are empowered to find our own solutions, and what I see is a huge amount of innovation, leading to a rapid, sustained improvement in data capability, and highly motivated staff who love their work.