IT’S A PEOPLE THING: CREATING A BETTER DATA CULTURE ACROSS GOVERNMENT
DAMA recently hosted a webinar examining the Government’s plans to improve data quality across all departments. The compelling discussion, hosted by our Committee Member Nicola Askham, featured a presentation by James Tucker.
James is Head of the Government Data Quality Hub at the Office for National Statistics. He updated attendees on the Whitehall-led drive to overhaul data quality across the public sector. We also discovered how the role of data professionals can be improved by implementing this future-facing strategy.
You’ll rarely read about seamless government data management projects in the mainstream media. There’s a tendency for journalists to focus on big budgets, difficult deadlines and glitches in delivery which are deemed more interesting for their readers than success stories.
It’s easy to see why this might be deflating for data management professionals working across UK Government departments. However, things should soon start to change for the better thanks to a number of data quality initiatives that are in play.
In 2020, ministers unveiled a new National Data Strategy. This initiative has included the publication of the Data Quality Framework. James says the framework will be used to “get the right data to the right place, at the right time” (to paraphrase the National Audit Office). It will provide a “guiding light for data quality” across departments. These are numerous: 25 ministerial departments, overseeing a total of more than 400 public bodies and agencies, with thousands of employees managing data.
While the framework places focus on providing better data for policymakers it will only succeed with an equal recognition that the people behind the data count, too. That means launching a concerted effort to win their hearts and minds. It’s a case of overhauling outmoded approaches to create “a culture of data quality”.
James leads the Data Quality Hub. Part of the Office for National Statistics, it was established midway through 2021. DAMA has been heavily involved in its launch strategy.
Within the government’s overall data quality drive, building a positive and productive culture is a key part of the Hub’s remit. This will help make the day-to-day experience of data professionals front and centre, rather than departments focusing solely on the data.
It won’t be easy. Departments - and teams within them - are disparate. James says it was difficult to find a succinct definition of culture. So, the Hub decided to explain how culture relates to data quality, and how existing approaches within government impact it, on its own terms.
Data quality management issues identified include: lack of leadership prioritisation; an enduring tolerance to sub-par data quality and a “sticking plaster” mentality; inconsistent approaches and limited knowledge sharing; new data sources bringing unfamiliar challenges; and, perhaps most challenging of all, an inability to determine whether data is fit for purpose.
A new data quality culture will close these gaps, James believes, if it wraps in the following commitments across all departments:
• Shared goals and strategies
• Ongoing data quality assessment
• Creation of best practice internal communities
• Linking departmental approaches to solve emerging challenges
• Closing the gap between data management at source, and data analysis/use
James illustrates the final point by citing an interesting pilot scheme. Data managers working on the emergency services frontline have been trained to understand how analysts might use the information they input, which should in time lead to data-driven policy improvements.
All of these initiatives paint an exciting picture of the future of data management for government services. But as we all know, strategy is one thing; implementation is quite another.
The good news is the Data Quality Framework sets out actionable principles for change. James states these are being adopted to tackle data quality challenges head-on and propel lasting cultural change across government. This will also mark a step-change in how data professionals’ experience their job.
1. A leadership commitment to high-quality data
2. Data quality activity that has users’ needs at the centre
3. Holistic approach to quality assessment, from data collection to use
4. Clear and effective communication of data’s quality in relation to its purpose
5. Better anticipation and understanding of policy/regulatory/technology changes that affect data quality
Ultimately, the principles will underpin a better and more widespread understanding in government that people matter if data quality is to reach new heights. James asserts, however, that the Hub has the right processes in place to guide people on the journey and make a difference.
Steps being taken include: widespread sharing of best practice; training, tools and guidance to build capability; ongoing advice and support, with self-serve products on the way; and a mechanism for data professionals to challenge briefs and help set the direction of their department’s data quality programme.
James says the final point must be prioritised at all levels of seniority, so boosting data literacy is key. However, he adds this will also require everyone who works with data to accept a greater level of accountability.
Conclusion: Quality in, quality out
Sharing the graphic below James concluded by expressing the hope that all data users, across departments, will gravitate to the top-right quadrant:
When that happens, data management for Whitehall departments will truly be a nurturing environment, shifting away from inertia and anxiety towards a culture of learning, and a focus on the future. Innovation and quality will surely follow - and that can only be good news for millions of people who use government services across the UK.
James is keen to learn from external practitioners. To ask him further questions, or to share your own experiences of culture change around data management and data quality at your organisation, please email DQHub@ons.gov.uk
At Data Orchard, we work with organisations - primarily not-for-profits - looking to improve their use of data to make better decisions and achieve greater impact. One of our key offers is assessing data maturity. You can read a bit more about the history of our work with data maturity in our blog, but suffice to say it’s been an ever-deepening fascination since 2010, when ‘big data’ and ‘open data’ really began making waves.
In 2015, after trying and failing to find a framework out ‘in the wild’ that would help give structure to how we think and talk about data with not-for-profits, we took matters into our own hands. In 2017, in partnership with DataKind UK we published our own. To our knowledge, it was the first data maturity framework produced specifically with the not-for-profit sector in mind.
Since then, our theory of data maturity has continued to evolve; we’ve launched our own Data Maturity Assessment Tool to help organisations figure out where they are; and we increasingly work to raise the levels of data maturity in the not-for-profit sector - both as trading consultants and champions of ‘data4good’ in our social cause.
As such, we are often asked to speak about data maturity, data maturity assessments, and what good can come of all this talk about data. In this blog, I wanted to explain a bit about our Data Maturity Assessment Tool and highlight some of the slightly surprising effects that assessing data maturity seems to have on organisations.
Invariably, when we’re asked to talk about data maturity, we find ourselves coming back to the basics… What is data maturity and, while we’re at it… what is data?
We define data maturity as:
‘An organisation’s journey towards improvement and increased capability in using data’
When we talk about data, we have a very broad and holistic definition. We mean all the types of information an organisation collects, stores, analyses, and uses. This could be recorded in many formats: numbers, words, images, video, maps. This means it’s everywhere in an organisation - in every department, in every service and team, and in every job role.
Because data is so pervasive, complex and - yes - messy, it’s really helpful to have a simple framework to give structure to how we think about it.
Our Data Maturity Framework sets out seven key themes and five stages of maturity, from ‘Unaware’, through to ‘Mastering’. The themes are areas that we’ve identified as being crucial when it comes to advancing data maturity. Some are more obvious and practical - like ‘data’ and the ‘tools’ you need to collect, store, and present it. Some are about purpose in how you use it and how you analyse it. And most importantly, three are about people. ‘Leadership’, ‘culture’ and ‘skills’ are essential to making data work for an organisation.
Figure 1 Data Orchard's Data Maturity Framework
To quote Maya Angelou "You can't really know where you're going until you know where you've been". Knowing where you are starting from with data maturity is really important.
After developing the Data Maturity Framework, we immediately found it was great in theory to know what constituted bad, good and great levels of data maturity. But, if it wasn’t easy for organisations (and especially busy leaders with little time or interest in data) to use it to assess where they actually are right now, then it wasn’t helping them plot their path to greatness.
Asking the right questions to assess what stage an organisation is at, under each of the key themes, is the art of the Data Maturity Assessment.
We launched our online tool - the free Data Maturity Assessment Tool - in 2019, and since then have created paid multi-user whole-organisation assessment version. There are also options for other agencies and partners to use our tool with their own clients or members.
The tool is, essentially, a simple online questionnaire that produces a clear, easily digestible report on where an organisation is, based on the answers given. It allows organisations to benchmark themselves against peers, plan their next steps and - depending on engagement across the organisation - identify differing viewpoints across the organisation.
I won’t go into too much detail about the tool itself - you can read about it on our website, and you can even use the quick, free, 5-minute taster version to try it out for yourself (though even the full version only takes 20 minutes to complete).
What I do think it’s interesting to highlight, though, are some of the less obvious benefits that organisations get from going through a data maturity assessment.
Let’s be honest, not everyone loves data the way that we, as data professionals, do. In fact, in our experience, most people actively dislike talking about data. Perhaps more worryingly, our State of the Sector 2020 report found that, in 63% of not-for-profit organisations, the leadership don’t see the value of data.
But, simply by going through the process of completing a data maturity assessment, we find that people from across an organisation (not least leaders) are:
● Encouraged to learn about data, by having to think and talk about it in new ways
● Inspired to come up with new ideas and motivated to get better at data.
I’m not sure if we even appreciated the power of this when we designed the assessment tool, but going through a data maturity assessment is, in many cases, a huge learning opportunity. It increases understanding about data maturity itself, and enables shared thinking and a common language about the challenges an organisation faces.
Yes, it also allows an organisation to measure where they are… But this opportunity to learn, coupled with its effect as a catalyst for further discussion and action, is possibly the biggest benefit we see for many organisations. We’ve lost count of the number of people that have reported back to us that they are suddenly having more positive conversations about data, feeling enthused about making plans, and that they have a better understanding of data and data maturity, simply through taking the assessment.
As ‘data people’, we think anything that gets people nearly as excited as us about talking about data has to be a good thing, and that is borne out in the… erm… data! You can read our 2022 impact report here. One organisation we witnessed become newly enthusiastic about data after conducting a data maturity assessment, is a cancer support charity that completed repeat assessments over the course of three years.
The graph below shows the difference in their data maturity from their baseline assessment in 2018 to their 2021 assessment three years later.
Figure 2 Change in organisation data maturity scores from first assessment in 2018 to fourth assessment in 2021
Clearly, this organisation’s journey didn’t just consist of taking yearly data maturity assessments. Their journey has been supported by various interventions. In the first year Data Orchard provided training and advice, over the next two years there was investment in new tools and training, improving data quality, analysis and reporting. However, that baseline assessment was a key part of engaging and inspiring people in the organisation to take action, and convincing leaders to invest. Repeat assessments have certainly helped sustain the momentum over time, as well as helping them prioritise and plan their next steps….and they’re looking for analytical skills on their leadership team too!
If you’re interested in watching my DAMA UK webinar on Advancing Organisation’s Data Maturity, it’s available here.
If you’re interested in finding out more about Data Orchard’s work in data maturity, or think you or your clients could benefit from a Data Maturity Assessment, we’d love for you to get in touch.
We also have a small but growing group of data professionals in our Slack group, aimed at connecting a community of data people who can share knowledge and experiences. Why not join?
Aaron Bradshaw, Data Governance & Enablement Specilst, Alation.
This is a follow on blog from the DAMA UK session (Data Ethics as Business Opportunity) held on 12:00 GMT 1st July 2022, hosted on BrightTalk.
Light is both a wave and a particle. Data ethics is both an imperative and an opportunity. New regulations covering data privacy and other ethical concerns require that enterprises govern internal data processes according to these new laws. And, while change at large organisations is tough, data leaders would be wise to reframe such transformations as business opportunities rather than burdens.
I recently led an online session, Data Monetisation and Governance, looking at the evolution of data governance, defining data ethics (from the Turing Institute), and touching on the balancing act between using data to monetise (by increasing revenue, decreasing spend, or mitigating risk) and meeting ethical obligations. In other words, ethics and governance aren’t just about mitigating risk; with the right approach, they can boost profits, productivity, and ROI.
The session began with an audience poll. I asked attendees:
● How often do you think about data ethics?
● What does data ethics mean to you?
Poll of attendees revealing the data ethics is low on their priority list
Interestingly, from a pool of data professionals, the vast majority think about data ethics just a few times a month. Contrast this response to a wider audience of non-data practitioners and this response changes to rarely/never.
Why is data ethics overlooked? When it works well, it doesn’t make headlines. I raised the Cambridge Analytica Scandal and pointed out how it is often only when these stories hit the news that people question the ethics behind how companies are using data.
In the Cambridge Analytica case, the company went from a data strategy focused on monetisation by increased revenue to company closure due to the reputational damage from the negative media and public response. Clearly, using private Facebook data collected in a nefarious manner to sway political elections is not ethical. In the court of public opinion, Cambridge Analytica had violated a clear ethical boundary.
Should individuals have autonomy over their personal digital data? The growing consensus is that individuals should have a say in how their private data is collected and used. People are demanding they have the choice to opt out of personal data sharing. Multiple regulations across the globe (GDPR, CCPA, CPRA, POPIA, HIPAA, PIPEDA, LGPD) are rising to this demand. Such laws are pushing the rights of the individual, ultimately trying to give everyone their own decision-making ability around how their private data is collected and used.
This presents a challenge to data practitioners, and an opportunity. Meeting regulations such as GDPR takes a huge amount of effort. A narrow focus has meant that a lot of organisations haven’t taken a step back and frankly assessed the collateral created to transform the organisation. Yet this collateral is valuable for more than just meeting GDPR demands! Organisations who do assess can gain additional benefits from the work that’s already done.
Indeed, ethical data practices can actually support data monetisation strategies. In the session, I walked through the matrix below, sharing examples of how organisations have used ethical standards to monetise or where monetisation has driven ethical innovation.
What’s your data strategy? Offensive strategies position data as product, while defensive look to data as insight. All strategy types offer opportunities for the business.
Under GDPR Article 30, the Record of Processing Activity (ROPA) document needs to be created and maintained. This effectively collates every process in an organisation that uses personal data, the type of personal data, what the process does, etc.
In an organisation with 500 processes, there is overhead to maintain the ROPA. Each process has a cost and a value (for example, the FTE cost divided by the time spent annually, plus infrastructure cost, against attributed revenue from the process). Once the ROPA has been created, organisations can review all operations from a bird’s-eye view, identify costly processes that are no longer effective, and decommission them.
● Decreased spend: One business saved £100,000 on average per process decommissioned and enjoyed less ROPA maintenance.
● Reduced risk: Less processing of personal data, lessened chance of breach, saving up to 4% of global turnover for GDPR fine mitigation.
● Increased efficiency: FTE resources can be reallocated to rewarding tasks that add value.
This was a great example of how ethical standards mandated in a personal data privacy regulation can be used to create value for a business.
Financial services businesses must meet extensive regulatory requirements, which demand full governance including: Data ownership, definitions, agreed-upon data quality rules and results, and lineage (BCBS239, CCAR, etc.).
Data governance exposes inefficiency. Many processes executed in silos for decades require numerous manual steps. However, these manual steps weren’t transparent until active data governance required it. Delivery of governance around these processes can unveil massive, inefficient processes with multiple manual (and often redundant) steps.
In my time working as part of the data teams in multiple financial services organisations, I’ve seen companies revisit processes due to data governance. With governance-as-guide, those organisations can simplify onerous processes, reducing 25+ stages with 10+ manual steps to just 15 stages with 5 manual steps, for example.
● Reduced cost: Finding and eliminating wasteful steps and processes saves time and money.
● Reduced risk: Streamlined processes reduce the chances of data being misused or untraceable.
● Digital transformation: Moving from end-user computing (and the associated benefits of disaster recovery, access control, and automated data quality), as well as faster processing times and improved operational efficiency.
The 2007/2008 financial crisis unveiled the monstrous risk of mis-reporting data. In its wake, many data leaders have made ethical standards core to their operations. This focus has led to simplified legacy processes, reduced total steps, and minimised manual effort (all of which contribute to lower costs and improved efficiencies!).
At a credit card company, there was an initiative to work with mobile phone networks to share data.
Imagine the scenario: You’re going on holiday in a foreign country. You land and disembark the plane. In the terminal you switch your phone on and, within a few minutes, you get a message from your credit card provider. It knows you’re overseas, offers you the chance to disable overseas spending, or extends a personalized 0% foreign exchange fee just for you.
For some people, these are received as great benefits that make their lives easier. However, for others, this can lead to impulse spending and they may not want to receive these. Further still, some people will not like the thought of these data exchanges occurring between companies they patronize.
Nicola Askham raised the concept of a “Daily Mail Test”. If the media could make an embarrassing headline out of your data usage, then it’s probably out of most people's ethical limits.
Why do we collect data? What is our duty to the individuals whose data we’ve captured? What does it mean to use this data ethically?
Such questions capture the complexities of data ethics today and reveal why some argue that data philosophers will be the new data scientists. Ultimately, every individual will have differing thoughts on what appropriate data usage means. Regulations will eventually empower people to exercise control over how organizations manage their personal data.
For data leaders facing such laws, communication around these topics is vital. The important thing is that, as a collective of data professionals, we need to promote and increase the conversation around these data uses, whether that is increased discussion at Data Councils to gain a consensus of acceptable uses within an organisation or raising awareness across the various users looking to gain insight and value from data.
CTA: See how ethics can support ROI. Watch the full presentation here.
KNOW, TRUST, USE: BUILDING BUSINESS SUCCESS ON THE THREE PILLARS OF DATA MANAGEMENT - Abel Aboh
For a DAMA webinar hosted by our Committee Member Nicola Askham, Bank of England Data Management Lead Abel Aboh issued a clarion call to data management professionals to demystify what we do, so the wider business can grasp the value of data.
Abel bases his approach to simplifying the complex concept of data management on three pillars. This article takes a closer look at how to build these foundations within your organisation.
Do your business leaders really understand what you do?
If the answer is a wistful “no”, it might comfort you to know that you’re not alone. According to research, 46% of all CDOs claimed their bosses’ understanding of their role is misinformed. Exactly half also said the value they add to their business is generally overlooked.
It seems logical that by extension the same will be true for many data management professionals, not just CDOs.
But instead of blaming a lack of C-suite - or even IT department - interest in our skills, perhaps it’s time to consider what we can do to move the needle, to unlock the business value of data management, and to explain why data underpins success.
Demystifying data starts with simplification
If you’re familiar with the DAMA Wheel - hopefully most of you are! - you’ll know there are multiple ways we segment our skills. Let’s face it, our expertise covers so many disciplines.
Yet therein lies the problem. Do we really expect non-data people in our organisations to spend time grasping the intricacies of our roles?
That doesn’t seem wise. At the same time, however, fostering a better understanding of data management across your organisation means your data strategy is more likely to succeed - from buy-in at all levels to demonstrating ROI that counts.
So let’s ditch the jargon and simplify what we say to people about data management.
It’s also important to note that we can be guilty of putting too much emphasis on trying to explain what data is. Abel’s advice is to avoid wasting time defining data. We must focus instead on the context of how we’ll use data to achieve stated business goals.
Defining the context of data management within a business is the best way it can be used to add value. Abel believes the case for an entire organisation to know the power of data can be built on three pillars.
The three pillars of data management
Helping your colleagues understand how data connects across the structure of your organisation is key.
These are the three pillars that can help you build insight and buy-in.
KNOW - The first pillar is a case of enabling the business to recognise the data it holds and how it is structured. What is the source of that data? How is it organised within the company? This also means describing how the data is managed - and by whom - and the processes that you follow to ensure it is a valuable asset.
Once your organisation understands the source, quality, lineage and structure of the data that is available it becomes much easier for leaders to:
• trust the data when making simpler, better and faster decisions
• drive effective and efficient business processes
• automate and scale personalised customer experiences
TRUST - The second pillar is vital to help your colleagues understand how and why they should trust the data the business has access to. Anyone with a limited understanding of data management might form the impression that what we do is highly restrained by regulation. This is true, of course, but it also means we’ve devised cutting-edge yet compliant ways to use data to the advantage of our organisations - in that it’s fully fit for purpose.
Building trust is a means to engage people with data management as a whole. Trust makes success more likely, helping decision-makers take steps without feeling the need to compromise out of fear or guilt.
Providing - and proving - data quality is a large part of this aspect. This really matters: one poll suggests 85% of companies have made strategic errors by using poor data.
USE - The third pillar, no less important than the other two, is all about unlocking the business value of data. Abel says this is where the rubber hits the road. If you have successfully constructed the first two pillars your organisation can attain a level of data literacy, maturity and strategy that makes the third pillar stand on its own.
As data management professionals, we support and deliver activity that uses data to maximise positive outcomes for our organisation, achieving strategic success against objectives across the business that is founded on data-driven, informed decision-making.
Data’s real value is in its use and reuse. Leave it to languish untouched in a data warehouse and it essentially becomes a cost - as well as a potential risk.
Use the “Six Cs” in your own data storytelling
You’ve all heard of data storytelling. In this final section we share Abel’s approach for telling the story of data management at your organisation. It’s called the Six Cs:
Communicate - Recognising most organisations have been through upheaval in the past two years, there’s a new need to reach out to colleagues to explain what we do. That means taking every opportunity to do so!
Connect - Data management is a team sport; we can’t do this on our own. Join the dots between adjacent skills at your organisation to blend business and technical knowhow.
Collaborate - Extending the point above, this means figuring out how your organisation can join forces with external experts to make the most of your data management strategy.
Change - Data management can be at the forefront of change management within your business, changing thoughts and behaviours to drive better outcomes.
Coach - We must get in the trenches, engaging and training people on the aspects of data management that matter in their daily role and the wider business strategy.
Create - Delivery of data management strategy is only limited by our imaginations. What else could you and your colleagues be doing to help ensure data makes a difference?
Conclusion: keep it simple
In summary, cut out the data jargon and keep it simple. Use the three pillars to communicate the fundamentals of your role, and explain why data has a big bearing on business success. Finally, call on the Six Cs to spread the word, build trust and showcase the business value of data, whether that’s:
• Boosting operational efficiency
• Making better decisions
• Improving business processes
• Other critical aspects of your organisation
Try the three pillars approach as a framework - even feel free to adapt it for the specific needs of your business. Let us know how you get on.
Rethinking data principles - Phil Jones, Enterprise Data Governance Manager, Marks & Spencer
Image courtesy of Kings Church International, www.Unsplash.com
Many years ago, I bumped into an Enterprise Data Architect, Gordon, in-between sessions in an MDM Conference in London. Over coffee and cookies, Gordon shared his frustrations on how data was managed in his organisation. I asked him to talk through some of his data architecture principles so that I could get a feel of whether they made sense … which he did, and they were impressive and well thought-through. I may have even borrowed a few for my own use. “So, what’s the problem?”, I asked. “No-one takes any notice of them”, he replied dejectedly. “They might as well not exist”.
Implementing principles based on the 4E approach
It has been said that “a principle is not a principle until it costs you something”: unless principles are acted upon, and enforceable, they are toothless. For the policing of the public health regulations to help reduce the spread of the coronavirus (Covid-19), the Metropolitan Police came up with an effective mechanism to do this. Their 4Es model was recently explained by Cressida Dick:
“Throughout the pandemic we have adopted the approach of the 4Es: invented in London and adopted nationally. We engaged with people at the time; we explained the restrictions; we encouraged people to adhere to them and, as a last resort … but only as a last resort … we moved to enforcement”
We are trialling the implementation of data governance principles and policies based on this 4Es approach: to engage with our data producers and consumers, explain the data governance principles and why they are required, and encourage them to adopt and abide by them: to direct their actions and behaviours on how they manage and use data. In those instances where people do not follow the principles, we have means in place to enforce them via the formalised roles and decision-making groups as defined in our Data Governance Operating Model.
How to make the principles more engaging
An example of a core data governance principle might be to fix data quality issues at source and not where the issue manifested itself. This might be a clear statement for a data governance geek, but potentially less so to others: they are entitled to ask “why?”. Scott Taylor evangelises the need to create a compelling narrative for your stakeholders: to bring your data story to life, make it more impactful, and ensure that the core message is memorable.
Plumbing analogous to the “Fix at Source” Data Governance Principle
So, to replay the “fix at source” principle using Scott’s approach, we can try out a plumbing analogy: if a drain is overflowing it is best to start off by understanding the plumbing system and then apply this knowledge to isolate the problem (e.g., a dripping tap) and fix the root cause (fit a new washer) rather than to mistakenly fix the consequence of the overflow (dig a bigger drain).
A Food Supply Chain analogous to Information Lifecycle Management
I favour the POSMAD information lifecycle that I first came across in Danette McGilvray’s book: the activities to Plan for, Obtain, Store & Share, Maintain, Apply, and Dispose of data. Working in a major UK retailer, any analogy about Food is likely to resonate with my commercial colleagues. So, when I talk to colleagues about the need to manage data across its lifecycle, I refer to the established practices that we already have in place in terms of how we manage our Food products.
POSMAD applied to Data
POSMAD applied to Food
Prepare for the data resource: standards and definitions; data architectures; data structures, etc.
Planning for the sourcing of new raw ingredients: food standards, sourcing standards, supplier standards, etc. Plus, considerations of how the raw ingredient will be used in the preparation of meal products
Acquire the data resource
Acquire the raw product from the farmer
Store and Share
Data are stored and made available for use, and shared through such means as networks or data warehouses
Appropriate storage for the raw ingredients throughout the supply chain, all the way through to how it is stored in our stores and after its purchase
Ensure that the resource continues to work properly
Maintain the steady flow of raw materials through the supply chain to support the continued production of the product, and the availability and freshness of the product in our stores for our customers
Use the data to accomplish your goals
Use the raw ingredients in the production of great products for our customers
Discard the data (archive or delete) when it is no longer of use or required
Best before dates, and our store procedures to check that products past their shelf life are removed from display and disposed of
Extending this analogy further allows us to position how data is an asset to our organisation in the same way as our raw products are our assets, plus emphasising the fact that we are already governing stuff in our organisation. Data Governance might be complex, but it is not a new concept.
The Highway Code analogous to Governance Principles
We have used the UK Highway Code as an analogy to our Data Governance principles, policies, and standards. The Highway Code provides the “governance and rules of the road” to ensure that all road users – pedestrians, cyclist, and motorists – have a shared and consistent way of using the same shared resource – roads and pavements – without colliding with one another. Playing out these themes with data in mind: the equivalent Data Governance principles and policies are the “governance of data” to ensure that all data roles – definers, producers, and consumers – have a shared and consistent way of using the same shared resource – data – without causing data “collisions”.
Keeping the principles alive
The Highway Code also helps to position the fact that Principles and Policies are a living document. You might be aware that the Highway Code is being updated on 29th January: the main changes are around the thinking that “those who do the greatest harm have a higher level of responsibility”. We need to ensure that we periodically check that our governance principles and policies are keeping track of … or are ahead of … legislation and how our customers and the wider society view our use of their data. As a closing thought do you have a data governance principle in place that is the equivalent to the medical profession’s Hippocratic Oath: “First, do no harm”? How do you govern the ethical use of data?
 Executing Data Quality Projects, Danette McGilvray
 Bill Bernbach, American advertising executive and founder of DDB, link
 Cressida Dick, Met Police Chief
 Telling your Data Story, Scott Taylor, link
 Executing Data Quality Projects, Danette McGilvray
Trust in data is about more than ethics – Deborah Yates
There was a time when almost every one of us would react to a new set of terms and conditions – be they from a software supplier or a service provider – by scrolling through to the end and clicking ‘agree’. We wanted to get at the goodies and didn’t think much about where our data would end up or how our actions would be tracked.
But in the post-Cambridge Analytica period, we are a little more circumspect. Many still opt for the scroll and click approach for brands that we trust, but even then we are far more likely to then go in and edit our data preferences. When it comes to apps, surveys and sign-ups from brands we don’t know then we may be willing to forgo any alleged benefits to retain our data.
In other words, the issue of trust in data has become a mainstream issue, even though many of those making such decisions may not realise that trust in data, and trust in the organisations collecting and using data, is an issue they are interacting with. They may cite ‘brand trust’ or ‘security’, but these are illustrations of trust in data and why it is now important for all businesses. Organisations should take it as read that those who interact with them have the right to ask how and why data is collected, how it is used and who has access to it. After all, these are the people who will deem your company trustworthy, or not.
We cannot take trust as a given, it is very much part of a relationship that customers or business partners give to a brand or business once it has earned it – and the definition of ‘earning it’ is nuanced and context dependent. That may be through experience or reputation, but either of those can be a gateway to a loss of trust as well.
Of course, demonstrating ethical values plays a large role in building trust. This can both paint a picture of how an organisation operates and speak to the values of those who interact with it. There is a reason that many organisations talk about their approach to staff welfare, the environment, animal testing or their position on fair wages for suppliers.
These issues may speak to the value base of customers, but it shows something wider. It establishes a brand as considered, thoughtful and trustworthy. It imparts a moral compass and hopefully reflects values across the board.
Ethical considerations in the way an organisation collects, uses and shares data is increasingly on the agenda - both from a social and economic perspective. The rise of data ethics - defined as a distinct and recognised form of ethics that considers the impact of data and data practices on people, society and the environment - as a recognised discipline is a testament to this.
However, demonstrating ethical collection and use of data is just one element of trustworthy data stewardship. Gaining trust requires organisations to go above and beyond good data governance practices. They will need to demonstrate trustworthiness in privacy and security, ethics and transparency, engagement and accountability, as well as equity and fairness. Addressing each of these areas can help to increase confidence in data, as well as trust in the businesses or organisations that handle it. In doing so, those addressing each area shift from the theoretical to the practical. After all, it is easy for organisations to make claims about any element of their ethics or data practices, it is quite another to visibly demonstrate that these ethics are integrated and embedded into every day business. Claiming ethical practicess will certainly win attention in the short-term, but failing to deliver on those can actually be more damaging to an organisation than failing to set such guidelines.
The Open Data Institute has long been working in the field of trustworthy data and data practices, with a team of experts who can help organisations to assess, build and demonstrate how they create and then embed ethical data practices that can be acted upon and upheld.
Please do get in touch if you would like to learn more about what the ODI can do to help your organisation demonstrate trustworthiness with data, or improve confidence in collection, use and sharing of data. We can work with you to develop your approach, or work with you to build on existing practices you may already have in place. We can also provide training for you or your staff.
Deborah Yates, Programme Lead, Data Assurance, the Open Data Institute.
Dr Jenny Andrew, Head of Data Chartered Society of Physiotherapy
Image courtesy of Joshua Sortino, Unsplash
The UK government’s Central Digital and Data Office (CDDO) has just launched an Algorithmic Transparency Standard for the public sector. The idea is to promote accountability in the algorithmic decision-making that is a growing feature of our public services and civic life. It has largely been received with enthusiasm, and when I agreed to write about it, I thought it would be in a similar vein. This is not the piece I planned…
I should say that I trained as a scientist, and I’m conditioned to view accountability as essential to continuous improvement, and to methodological rigour. And I’ve been an active trade unionist all my working life: I’ve done my time in industrial relations, at the sharp end of holding decision-makers to account. We need more accountability, and structures for citizen-engagement in all our institutions, all our services, and in all businesses. However, the material point, and therefore the object of accountability has to be the decision, and its real-world impacts, not an algorithm or any other tool that feeds it.
To see why this framing matters, look no further than the 2020 A-level results – one of the cases that precipitated the development of the standard. When students chanted “F**k the algorithm”, they gave the exams regulator and the Department for Education a scapegoat for a succession of bad decisions. As a result, the infamous algorithm was dropped, and rightly so, but there’s been relatively little scrutiny of the circumstances that surrounded it.
As it happens, I watched the A-level results fiasco unfold, like a slow-motion car-crash, with commentary from a retired senior examiner and experienced moderator: my mother. “They’ll use the predicted grades in the end,” she said, on the day in March that the exams were cancelled, “They won’t be able to adjust the distribution so far beyond its range.”
Looking back now, my mum’s prediction was a neat illustration of the competencies that weave together into good data design:
It is rare to find those capabilities tied up in a single person. When we design data-centric processes and tools, therefore, we assemble teams and focus groups, and structure projects with touchpoints that ensure coordination. Data management professionals understand data as a whole-lifecycle concern, so we build relevant expertise into every stage of it.
Bad things happen when we arbitrarily decouple data acquisition from storage and use, or when we pass responsibilities from data manager to software developer to data interpreter like a relay baton. Often the risks in algorithmic decision-making can be traced to those handovers: cliff-edges in domain knowledge and disconnects from the holistic view.
The Algorithmic Transparency Standard, as it stands, reinforces a rather narrow, tech-centric perspective. Here’s how I think it can be recast into a more joined-up approach:
Even the title is a problem. The current hype has rendered ‘algorithm’ a loaded term, often conflated with AI and machine learning. Although the guidance suggests a wider scope for the standard, I doubt, for example, that an exam moderator working on a spreadsheet would consider using it. (If you’ve seen the mischief that can be done with a spreadsheet, there’s no way you would exempt those decisions from due scrutiny!) The standard itself should be rebalanced to give more weight to the people and process elements that contextualise the analytical technology, and more detail to the data that feeds it.
Analytical technology should be viewed in its place within the data lifecycle. We know that its outputs are only as good as the weakest link in the data supply chain. Taking a whole data lifecycle perspective right from the design phase of the analysis helps to forecast and avert choices that may be embarrassing to recount with hindsight. Furthermore, as any research data manager can attest, designing for accountability makes capture of the essential metadata a lot less troublesome than trying to reconstruct it later.
Accountability in public sector decision-making cannot be the preserve of ‘tech-native’ people. We need meaningful participation from across domains and interest groups. Not every citizen will follow all the details of the data and technologies that are used in the public sector. We can, however, target networks and civil sector organisations whose advocates may play a more active role. In the trade union movement, for example, we are developing data literacy among our reps and officers, to complement their expertise in employment and industrial relations at the negotiating table, on behalf of fellow workers.
To establish any data protocol in a way that sticks takes a combination of authority, useability and motivation. As a profession, data management can enhance all three in the transparency standard. We are custodians of the organisational structures and processes that will need to support and integrate it. Our experience tells us where contributors will struggle with information gathering, and our workflows are the key to making it easier for them. And our data-centric perspective holds the link between technology and its real-world purpose, keeping it relevant to stakeholders and measuring it by its impacts.
Real-world impact is what matters here. Spare us all from yet more data gathering without material purpose! I wonder how the Algorithmic Transparency Standard will perform outside the 'laboratory conditions’ of its creation. Will we look back in time to see that it made a real-world difference to the decisions affecting society and public services. Probably not with its current, limited viewpoint. Not without expert, structural support.
This isn’t the enthusiastic post I planned to write, not because I want the standard to fail, but because I really want it to succeed. I think it needs a critical friend more than it needs another cheerleader, and our profession is uniquely suited to that brief.
So, I’m thinking about how we can enhance what’s good in the Algorithmic Transparency Standard, how I materialise the principle of accountability in my own professional practice, and how I can support my colleagues and trade union community to adopt it into theirs. I would love to hear other DAMA UK members’ ideas on the subject. And I would love public sector bodies, the CDDO included, to talk to us about how they can build this standard into constructive and sustainable citizen-engagement about the services they provide.
The Cynefin Framework applied to Data Governance
Phil Jones, Enterprise Data Governance Manager, Marks & Spencer
When I first got involved in data governance I found the breadth, depth, diversity, and complexity of the subject matter somewhat overwhelming, particularly on where to start, how to get it done, and how to introduce change that would stick.
My background is in business process architecture with some project management experience, so I set about applying these trusted techniques: they had worked for me in the past, after all, so why not now? I figured that the underlying causes of the data quality issues that I needed to fix were related to failures of understanding and on process adherence. My plan was to do detailed analysis to find a “one best way”, and then implement change supported by KPI measurement. I saw some progress, but it became apparent that my preferred approaches to problem solving were not sufficient, particularly when I started to encounter the need for behavioural and cultural change.
It was about this time that I came across the Cynefin framework. Cynefin [ku-nev-in] was developed by Dave Snowden in 1999 and has been used to help decision-makers understand their challenges and to make decisions in context. It has been used in a huge variety of scenarios: strategy, police work, international development, public policy, military, counterterrorism, product and software development, and education. This blog is my attempt to apply Cynefin to Data Governance.
What is the Cynefin Framework?
Cynefin provides a framework by which decision makers can figure out how best to approach a problem, helping us to distinguish order from complexity from chaos. It helps to avoid the tendency for us to force a “one size fits all” approach to fix problems, best explained by Maslow: “if all you have is a hammer, everything looks like a nail”.
What does the Cynefin framework look like?
The framework is made up of five situational domains that are defined by the nature of the cause-and-effect relationships: clear, complicated, complex, chaotic, and disorder:
The “Clear” Domain: the domain of best practice
The relationship between cause and effect exists, is repeatable and is predictable. This is the realm of “known knowns” and decisions are not questioned as there is a clear approach to follow. The approach that you apply to a problem or decision within the Clear domain is:
Sense à Categorise à Respond
This is the domain of best practices: each time you encounter a problem of a certain category you follow a script that guides you through to the resolution of that problem with a high level of confidence of a positive outcome.
I am a keen cyclist. A problem that I encountered on a recent ride was a puncture. Sensing a puncture is not so difficult: the bike becomes slightly wobbly, and you can feel the metal rims of the wheel impacting on the road. Categorising the problem takes seconds: pinching the tyre confirms the issue. Based on this problem categorisation I responded by following a clear and well-established routine – a best practice – to fix the puncture so that I could continue my ride.
The Complicated Domain: the domain of good practice
There is still a predictable relationship between cause and effect, but there are multiple viable options available. This is the realm of “known unknowns”, is not easy to do and often requires effort and expertise. There is not a single “best practice”; there are a range of good practices: problems in this domain require the involvement of domain experts to select the most viable approach. The approach that you can take to a problem or decision within the Complicated domain is:
Sense à Analyse à Respond
To illustrate this with another cycling analogy: my bike recently developed an annoying squeak that I couldn’t fix. A friend who knows far more about bikes that I do came up with a couple of ideas but neither of them worked, so I went to a bike shop where the mechanic did a more thorough inspection and was able to isolate the problem: a worn bottom bracket. I sensed the problem – the squeak – I sought the advice of experts to analyse the problem. My response was to apply the most viable option using good practice to fix the squeak. I can now ride with a greater amount of serenity.
The Complex Domain: the domain of emergence
The Complex domain is where the relationship between cause and effect potentially exists, but you can’t necessarily predict the effects of your actions in advance: you can reflect in hindsight that your actions caused a specific effect, but you weren’t able to predict this in advance, and you can’t necessarily predict that the same action will always cause the same effect in the future. Because of this, instead of attempting to impose a course of action with a defined outcome, decision makers must patiently allow the path forward to reveal itself: to emerge over time.
The approach that you can take to a problem or decision within the Complex domain is:
Probe à Sense à Respond
This is the domain of emergent practices and “unknown unknowns”. You test out hypotheses with “safe to fail” experiments that are configured to help a solution to emerge. Many business situations fall into this category, particularly those problems that involve people and behavioural change.
When Ken Livingston announced a scheme to improve transport and the health of people in London in 2007, he said that the programme would herald a “cycling and walking transformation in London”. Implementing this has largely followed the Complex approach: transport officials studied schemes in Paris and elsewhere and considered the context of the problem in London. They talked to (probed) commuters, locals, and visitors to better understand people’s attitudes and behaviours related to cycling. They assessed the constraints imposed by the existing road network, and a range of other challenges, from which they came up with a limited “safe to fail” trial.
The scheme launched in 2010 in a localised area and the operators of the scheme assessed (sensed) what worked and responded by applying adjustments. For example, the initial payment process required access keys; this was replaced by users of the scheme having to register on an app. The scheme has continuously evolved with further innovations: it has followed an emergent approach, with successful changes amplified and those less successful dampened. The scheme as of today is different from where it was when first launched, and it will continue to evolve.
The Chaotic Domain: the domain of rapid response
The Chaotic domain is where you cannot determine the relationship between cause and effect: both in foresight and hindsight: the realm of “unknowable unknowns”. The only wrong decision is indecision. We come into the Chaotic domain with the priority to establish order and stabilise the problem, to “staunch the bleeding” and to get a system back on its feet. We don’t have time to look up a script: the problems we are seeing have not been experienced before; we can’t call up experts and rely on best practices; we can’t devise a set of experiments to see if we can emerge a new practice to tackle the problem. The approach that you can take to a problem or decision within the Chaotic domain is:
Act à Sense à Respond
The practices followed in the Chaotic domain are novel practices in which you move quickly towards a crisis plan with an emphasis on clear communications and decisive actions.
An example of where cyclists experience chaos are the mass crashes in events such as le Tour de France. An accidental touching of wheels or an errant spectator can bring down the whole peloton, resulting in cyclists and bikes strewn across the road. The immediate priority is to act: who is injured? Medics triage and act to look after those in pain. Are the bikes in one piece? Mechanics act by applying fixes or replacing broken bikes. All of this is done in a matter of seconds.
With the casualties being looked after, those in a position to continue sense what to do: if their team leader is down, what should the team do? They don’t put out an urgent request for white boards and assemble for a team meeting: the race is not going to wait for them. They get back on their bikes and work out their response and get on with it, and then adapt their approach as required. They may form temporary alliances with other teams to catch back up with the main peloton: they have come up with some novel practices and have been able to get some semblance of order out of chaos.
The Disorder Domain: the domain of emergence
Disorder is the state that you are in when it is not yet clear which of the other four domains your situation sits within, or you are at risk of applying your default approach to problem solving irrespective of the nature of the problem. The goal is to best categorise your problem into the most appropriate domain as quickly as possible.
Movement between domains
The framework is dynamic in nature in that problems can move between domains for many reasons. The guidance within Cynefin is that the most stable pattern of movement is iterating between Complex and Complicated, with transitions from Complicated to Clear only done when newly defined “good practice” has been sufficiently tested as “best practice”.
Beware of complacency
Clear solutions are vulnerable to rapid or accelerated change: the framework calls out the need to watch out for the movement of a problem from Clear to Chaos. There is a danger that those employing best practice approaches to problem-solving become complacent: they assume that past success guarantees future success and become over-confident. To avoid this, Clear problem-solving approaches should be subjected to continuous improvement, and mechanisms should be made available to team members to report on situations where the approach is not working.
There was one time when I fixed a puncture but was then distracted, or complacent, when putting the wheel back on the bike. I found this out to my cost when I was going down a hill and lost control of the bike – a chaotic event, at least for me. Fortunately, I landed on some grass and only my pride was wounded. Having learnt my lesson I now double-check that everything is okay before setting off.
Where does Data Governance sit within the Cynefin framework?
Dealing with Chaos
Snowden states that “there has never been a better chance to innovate than in a major crisis … where we need to rethink government, rethink social interactions, and rethink work practices. These must take in place in parallel with dealing with the crisis”. We have all had to innovate throughout the recent Covid-19 pandemic and the ensuing lockdowns: for example, how to collaborate on managing data in an effective way when we are all working remotely; how to best onboard new joiners and provide them with an understanding of the data they need to perform their roles. In my team we acted quickly to come up with some novel solutions and have adapted them over time.
According to Snowden, many business situations fall in the Complex and Complicated domain areas. Anything involving people change, such as changes to behaviour or to organisational culture, sit in Complex. It is widely agreed that behavioural and cultural change are the most challenging aspects of Data Governance. Nicola Askham, the Data Governance Coach and DAMA committee member, makes this point really well in a blog (link) where she discusses “the biggest mistake that I have seen is organisations failing to address culture change as part of their governance initiatives … This mistake … can ultimately lead to the complete failure of a data governance initiative”.
To add to this view, in a recent open discussion in social media another DAMA committee member, Nigel Turner, positioned two key features for successful data governance programmes that apply to any organisation: firstly, that “it must be unique to each [organisation] as it must be embedded within the existing cultural and organisational context of that organisation … one size does not fit all”. Secondly, when considering the challenges of engagement and adoption: “how do I get business buy in? How do I appoint data owners and data stewards? How do I demonstrate the benefits of data governance? How do I prioritise potential data improvements?”
These opinions from highly respected data governance professionals place those components of data governance related to people change within the Complex domain where, as we have learnt from earlier, the approach is to probe, sense, and respond.
In the domain of good practice, my team in Marks & Spencer have developed a range of good practices which can be applied by subject matter experts (SMEs) to a problem. For example, we have developed a “Data Governance by Design, by Default” and a “Data Quality Remediation Approach” good practice guides that SMEs can refer to when tackling problems and opportunities of this nature. Both good practice guides are informed by our Data Principles and Policies, which also sit within the Complicated domain: the principles and policies are directive and require a certain amount of effort and expertise to apply. All these artefacts are subject to continuous improvement.
In the domain of good practice, the focus is on developing repeatable patterns to apply the appropriate response any time when the situation is encountered. In M&S we have developed automated processes to perform data quality checks at scale against specific business rules, and to automatically tag datasets to support information classification. These rule sets were carefully constructed and tested, and only moved to the Clear domain when they were approved; however, this is not a “fire and forget” approach: we monitor the performance and the currency of the rules to ensure that they remain fit for purpose.
When you’re next faced with a problem, question whether the approach that you are about to apply is appropriate: does the problem really sit in best or good practice, or do you need to do some probing, sensing, and responding? And when you are working on a problem in the Complex domain, how comfortable are you to create environments and experiments that allow patterns to emerge in the face of pressures for rapid resolution of the problem and a move to command and control?
You can also use the framework to challenge those who claim that complex problems have simple solutions and recognise where their biases are leading them to misdirected and constrained thinking. For example, people who prefer to operate in the Clear domain may try and impose KPI measurement for what is a Complex problem. This can result in the undesirable behaviours of the gamification of the KPIs and thereby giving a false sense of progress, rather than addressing the underlying problem.
Cynefin has really helped, and continues to help, the work that my team is doing in Data Governance where the problems we face manifest themselves across all five domains. By understanding the characteristics of a problem we can rapidly apply the best approach to properly understand the problem and then work out how best to set about fixing it.
Dave Snowden is highly altruistic in how he shares his ideas and his expertise: I encourage you to visit his website Cognitive Edge (cognitive-edge.com) for further resources, and there is a load of video content online. I can particularly recommend his commentary on hosting a children’s party: highly amusing.
If you have applied Cynefin to help fix a problem already, or after reading this blog, it would be great to hear from you.
 Content relating to the Cynefin framework in this blog has largely been sourced from Dave Snowden’s excellent book: “Cynefin – Weaving sense-making into the fabric of our World”, and his many articles and online videos. The examples provided relating to bikes and data governance are mine alone, as are any errors.
Mentors – we need you!
2021 marks the 10th anniversary of the DAMA UK mentoring scheme. Our award winning programme has proven very popular and the numbers of applications for mentoring are growing year on year. But we are at risk of becoming victims of our own success and we need to recruit more mentors to support the increase in requests for mentoring.
The scheme’s main aims are to:
Why become a DAMA UK Mentor
Obviously a key part of the role of a mentor is to support the mentee, but it is also an opportunity for personal development. Professional development organisation, Art Of Mentoring, lists the following 11 reasons why you should say ‘Yes’ to becoming a mentor:
Nigel Turner was one of the original founders of the DAMA UK mentoring scheme:
“Having been a mentor myself since the start, what have I learned about mentoring? First, being a mentor is as much a learning experience as being a mentee, as I have been exposed to many different data management people and their problems working in a wide variety of organisational cultures, including small businesses, global multinationals and UK government departments. This has taught me that although good practice in data management is often generic, with many different organisations facing similar challenges with data quality, governance, reporting and so on, understanding specific cultural contexts is critical to providing viable support and advice. What works in a small business may not do so in a multinational and vice versa.
Moreover, what mentees usually want is not someone to tell them what to do, but a mentor who acts as a sounding board to listen to their ideas and thoughts, ask independent questions, provide feedback and generally act as a supportive friend who has their best interests at heart. In essence, mentoring should be all about helping others to develop themselves in the direction they want to go in. As the film director Steven Spielberg observed, “The delicate balance of mentoring someone is not creating them in your own image but giving them the opportunity to create themselves.”
“I think my favourite thing about being a mentor is the opportunity to talk to people outside of my day job about data management and not be considered weird! No matter what industry you work in there are so many commonalities when it comes to data. Being able to help people, especially those starting out in their careers, to prepare for typical hurdles and encourage them to develop a full understanding of some of the complexities in our field so that they can be effective in what they do is extremely rewarding.
If you are a DAMA UK member and would like to volunteer as a mentor visit the mentoring pages on our website at https://www.dama-uk.org/Mentoring for more information on the scheme, and how to get involved. Here’s to the next 10 years!
Photo by Owen Beard on Unsplash
The creation of a central NHS digital database from GP records in England - General Practice Data for Planning and Research (GPDPR) - has been delayed by two months. The system was due to launch on 1 July, but the date has now been pushed back to 1 September by the government.
The NHS had been calling for a delay to allow patients more time to learn about the system. The British Medical Association and the Royal College of GPs had also expressed concerns.
Parties that currently oppose the database claim a lack of transparency in the process and have demanded greater consultation on how the scheme would work, as well as better communication about how patients can opt out to prevent their data being shared.
In a recent, lively conversation DAMA UK board members Mark Humphries and Akhtar Ali discussed the pros and cons of the proposed scheme.
Both of our experts’ positions were nuanced, with some agreement over the potential positive outcomes and shortcomings of the proposals.
Mark declared himself broadly in favour of the programme, citing the ability to boost future medical research and improve treatments for the UK population. Akhtar immediately countered by stating the NHS has already captured and shared data for many years, citing advances made in controlling and treating Covid-19 as an example.
Mark added he’d be happy for his data to be shared, believing it’s “a small price to pay” for groundbreaking, genetics-based research to come. He drew a line to medical research of the past, which involved a level of personal sacrifice for participants in early organ transplants - giving rise to many procedures we take for granted today.
While current data collection currently enables NHS and academic research, Mark elaborated: “There are plans for two specific blocks of research [with the new data strategy]. One is large-scaleplanning. The other way it will be used is for developing new treatments.”
However, Akhtar pointed out that successful lawsuits have resulted in courts preventing various commercial organisations from “patenting individuals’ genetics” as part of their plans.
Mark conceded that using data to develop treatments is controversial as it would likely involve commercial organisations, not least pharmaceutical companies.
He cited the case of Henrietta Lacks, an American woman who died of cervical cancer. Johns Hopkins University removed cells from her tumour that have been central to medical research ever since - without her family’s knowledge.
Mark explained: “They were horrified. They raised the issue of big profit, people making money out of their mother’s cells. The case has since been recorded in a book written by a journalist, and it’s a source of massive pride to her family that her cells have made such a massive contribution to medicine. But in the context of data sharing this is highly relevant - when there is no trust, there's anger and bitterness.”
Sharing data beyond the public sector would undoubtedly be a cause of concern for many people, Mark added.
Akhtar seized on this point. He pondered: “The bigger question is: how far would companies go to get access to that data, and who will they sell it to?” He pointed to the dilemma arising from dealing with the pandemic: “Most of the data and money to fight Covid came from the public sector. But the profits are ringfenced to corporations that aren’t willing to give the vaccination free of charge to poorer, developing nations to protect their critical care staff.”
He largely opposes the NHS digital database plans due to a perceived lack of transparency around whether consumers or commercial organisations will really benefit from the sharing of a data set worth around £5bn per year. (NB a figure of £10bn has been quoted - this includes the £5bn pa estimated value of the data and cost savings to the NHS.)
Akhtar also said there have been around 1,300 NHS data breaches in the past two years alone. He believes fines issued by the ICO are too small to act as a deterrent for poor data management and security - with the proposed changes potentially opening the floodgates to far greater problems.
He said: “Once we have given away £5bn-worth of data, no commercial organisation is going to relinquish “free money”. This has been demonstrated by the investment of vast public funds in Covid vaccines, yet when those same organisations are asked to provide them at cost to poorer nations they suddenly claim they aren’t charities - despite suggesting they would gladly get involved in such a scheme.”
Akhtar compared patient data sharing beyond what is possible today to “moving from keeping embarrassing photos in a private album at home to revealing them on Facebook”.
Mark countered that data will be pseudonymised when it leaves the GP practice - but recognised that, even with many personal details removed, records could in theory be used to identify a patient.
He explained: “One of the things you need in order to make sure you can still link data together is a unique key. It will look like random data, but allows people to trace from the beginning of the chain to the huge database. You can identify the individual if you've got access to every link in the chain.”
Akhtar questioned the future extent of the data-sharing scheme and whether its scope would be widened once the programme’s initial aims had been established.
He believes the proposals suffer from a lack of trust amongst patients, and also suggested winning buy-in from the medical community would be difficult since their feedback on previous attempts to centralise and manage NHS data hadn’t always been heeded.
“If the government had nothing to hide it would follow its own laws which were set out in the revised Data Protection Act following the GDPR regulations,” he stated. “They need to clarify who the external parties will be and what they want the data for.”
While reaffirming his support for the scheme based on its uses for planning, research and treatment, Mark agreed with Akhtar about a current lack of transparency: “At the moment trust is missing, and that is vital in determining whether this is a success or an opportunity missed. In this delay period the government should engage, addressing these valid concerns. Who will have a say in the proposed legislation, assess applications to access the data, put controls in place - and how often will it all be reviewed?”
In conclusion, Akhtar pointed out that even data professionals remain in the dark about the nuts and bolts of the programme: “If most people in data didn’t know this was happening until just before the original opt-out deadline, how could individuals be aware of it and know how to opt out?”
It will be fascinating to follow the debate before the September opt-out deadline, and beyond.
(You can read the full transcript of the head-to-head discussion below.)
MARK: There are two different data sharing initiatives going on at the moment. One is pulling data from different health trusts and GPs so that all healthcare practitioners have access to your medical records wherever you turn up in the system. That is completely different to GPDPR, but the two are happening at the same time, which is a bit confusing.
The two issues do get conflated. There is also an opt-out mechanism built into that. So you can say, I don't want my data shared between all the different hospitals and trusts for whatever reason. But that is limited to keeping the data within the NHS so it's only used for healthcare purposes. That is one of the benefits of having a monolithic national healthcare system.
AKHTAR: The analogy there for me is your gran keeping a photo album. Photos are only for the album in her house, and only she's got access to it. We're now moving on to gran putting pictures on Facebook, but it’s locked to herself. So those embarrassing pictures of you are quite safe.
So we're on a journey like that. Your data has been captured, you’ve seen a doctor, talked about some potentially embarrassing stuff. But you share that on the basis it’s confidential to your doctor.
MARK: I would argue it's not the same as putting it on Facebook, in the public domain. The purpose of sharing is to enable research within the NHS and universities. At the moment there are plans for two specific blocks of research.
One is large-scale planning. It's about managing healthcare capacity and treatments on aggregate numbers. It doesn't really matter who the individual statistic is, it's about the large numbers: how many people are getting liver cancer, prostate cancer, breast cancer; what do childhood diseases look like?
The other way it will be used is for developing new treatments. This is where it starts to get controversial - sharing healthcare data with commercial companies like Pfizer and AstraZeneca, so that they can use the data to develop new treatments. That's when the data goes outside the public sector. [The notion that] companies are making profit out of our data alarms people.
And how do we know this data will be safe, that hackers aren't going to get their hands on it? An important point is that the data will be pseudonymised when it leaves the GP practice. In principle your name, NHS number, address, date of birth (but not year of birth) are removed. So in that sense it’s not like the Facebook analogy.
Just looking at the data, you wouldn't be able to identify it’s Akhtar. But one of the problems with pseudonymised data is if someone is determined, and they have the tools and ability, they can often use various remaining attributes to build a picture - like a fuzzy photo - which is good enough to identify it’s probably Akhtar.
AKHTAR: Let's step back a bit. Many of these things are already in flow. Your GP captures your data - they need to do that, research is ongoing. Historically, you could opt out. There was an interesting case where individuals had opted out, but the process fell over and 150,000 individuals had their data shared even though they requested it wasn’t. There have been something like 1,300 NHS data breaches in the past two years.
NHS research happens with data sharing, for example to come up with treatment for kidney failure. Things are already happening, but we have to step back and realise NHS data will be worth £5bn per annum. That's a big number.
When you start saying £5bn today, what will it be worth tomorrow? I’ve opted out, I’ll put my hands up. My family suffer from kidney disease. My father had a transplant. I understand the need for that. But NHS and other health organisations already have access to our data. So we need to make a big distinction that it’s nothing new. At this moment in time, we can share data and we can opt out. But who else can have access to that £5bn-worth of our data?
The Royal College of GPs, BMA - all of them have challenged this. Mark talked about pseudonymised data. But the government's had about 200 GDPR breaches since the law kicked in - they're the ones that we know about. Then there are hackers, I don’t think any institute is foolproof.
The bigger question is: how far would companies go to get access to that data, and who will they sell it to? This is the reason the Royal College of GPs and the BMA and others have challenged this, because there’s no clarity. Will it be sold to pharmaceuticals, AI companies, or even investors potentially looking to buy into UK hospitals and so on, cherrypicking what to buy based on the data?
The pandemic is a great example of how data was shared. But at the same time, the vast majority of investment in vaccines was funded with public money. Why should we be worried to share our data? Well, that’s a perfect example. We spend billions of pounds of taxpayers’ money to make our vaccine, using our data. But when it’s for the greater good, vaccinating people in poorer countries, it’s a case of these companies saying it’s not their purpose. They are commercial organisations, not philanthropists.
To me, it's about asking what are the implications once we open the floodgates?
MARK: A technical point about pseudonymisation. One of the things you need in order to make sure you can still link data together is a unique key. It will look like random data, but allows people to trace from the beginning of the chain to the huge database. So it's not completely anonymised. You can trace the data back to identify the individual if you've got access to every link in the chain.
If it was fully anonymised, it would be a one-way flow with no way to link it back to the original source records. There's an awful lot of work goes into pseudonymisation, what you can and can't do.
I also want to make a quick point about the risks Akhtar has laid out. Things could go wrong - they’re all valid concerns. What he hasn't said is, this data should not be shared.
I'm keen to emphasise from my point of view that I'm still in favour of sharing healthcare data. But secondly, you need to have trust in order to share data. And at the moment trust is missing.
So the most important thing to do with this delay period is engage, have the debate, address all those concerns. Under what conditions will data be shared with commercial companies? Who will have a say in the proposed legislation, assess applications to access the data, put controls in place?
What I haven't seen is any details about who will be on that body and their terms of reference or decision-making process. We need to put controls in place relevant for today, but also reviewed on a cycle to assess whether they are still relevant and robust at that time.
If you look back at the history of medicine, a lot of what we take for granted today involved sacrifice and some dodgy ethical groundwork - anatomy, grave robbers and so on, so the first doctors could understand how the human body works. Even organ transplant as recently as the 1960s is relevant. Doctors didn't understand organ rejection, so just went ahead and implanted living kidneys. Not only did the patients die, but the deaths that they suffered were actually much worse than natural kidney failure.
And yet, if that experimentation had not happened, then organ transplantation and the the anti-rejection drugs which have been developed off the back of it would not be in place. This is something we take for granted now. So there is sacrifice and so, from my point of view, that’s why I'm happy to share my data. I think it's a small price to pay for future benefits of medicine.
Future medical research will be based on genetics. A big pool of data would therefore be valuable from a research point of view, to identify certain genes and how they affect the whole population.
AKHTAR: I watched a programme which was a discussion about the majority of learnings for the basis of medicine coming from Islam. Oxford University has many historical books on medicine, but wasn’t willing to share so much knowledge so [the programme said] they hid the books. So my concern comes back to trust.
Then genetics and DNA. Who is the benefit for in the US when they want to patent my DNA? The big thing we don't talk about here is the ethics of data. It's going to become the blood supply of capitalist organisations trying to get into the NHS on the cheap.
This is a concern for all health organisations and charities - where is that data going to go? Why can't the government come out and say, these are the potential companies we want to give it to. If it's all about my care, why would you want to patent it, why don’t you give it to everyone so they can all come up with the best cure?
At the moment, we're still getting a good deal in comparison to the rest of the world on medication cost. So what are the controls, and who are the benefits for?
MARK: The government absolutely needs to build trust. Unless they do that, this will fail. I think that will be a huge opportunity missed. This has the potential to unlock future healthcare treatments that we will all benefit from. People's valid concerns need to be tackled.
When there is no trust, there's fear and anger. But if concerns are recognised, the conversation is had and the value is explained, it actually becomes a very positive story.
Another general point, a lot of people think GDPR was put in place to stop data sharing. But actually one of its main goals is to encourage data sharing by putting trust in place. So there are limits to what you can do and protections in place so people know what’s possible and how they can complain.
In the pandemic, vaccine scepticism is uneven in the population. If you saw the same attitudes in NHS data sharing and didn’t get representation across age and ethnicity that would be a problem. Getting a big enough sample is important but it needs to be representative.
AKHTAR: We can talk about trust and benefits, but the government tried to do something similar in 2014 with social care data. We have pooled data for the NHS and it’s shared in the UK. A hospital anywhere will have my records. These things already happen - so the big thing is who do we want to share the data with?
Be up front and tell us the purpose. It feels cloak and dagger. Most data professionals didn’t know this was happening till just before the [original opt-out] deadline so how could individuals be aware of it, and know how to opt out? That illustrates the problem of trust. When billion pound contracts are handed out people will wonder who’s really benefitting.
If GPs and the BMA are not comfortable then it rings alarm bells for anyone who uses the NHS.
MARK: I am still pro data sharing, but of course all of these concerns are valid. We need to talk about what measures will be taken to secure the data, and be transparent about who it will be shared with. Good management of risk is critical here.