Data: who owns what and when?

And why it is different from chocolate

Mar 29, 2024

I am very fond of chocolate. Cadbury Dairy Milk when growing up in NZ last century. Lindt 90% Dark here and now in Australia in middle age (partly because Whittakers is just too addictive to be trifled with). One of the (many) great things about chocolate, is that I’m never confused about who owns it. From bean to bar to shelf to cupboard to mouth, money changes hands and so does the chocolate. Confusingly, the same can not be said about data.

You can have a copy, and you can have a copy, and you can have a copy …

One obvious difference between data and chocolate is the effectively zero cost of duplication for data. I can store a PDF of my latest blood test results, so can my doctor, so can the pathology lab that drew the blood and ran the diagnostics. However even this simple top of mind example starts to crack open the complexity and uncover a few of the reasons that data ownership is becoming more and more fraught and high stakes.

Do my blood test results belong to me? Few would argue that they don’t! If the cost of the blood test was funded by the public health system, does the government of Australia have a claim? If I paid for the tests myself do I have a ‘stronger’ ownership claim? For results ‘mediated’ by my doctor, does that clinic retain a claim?

At the level of a single blood test, a single human and a single clinic these questions can seem petty. Scale it up to the healthcare systems of a country with a significant tax-payer funded public health system and it’s not trivial at all.

Market power and playing chicken

Ownership becomes particularly fraught when two or three or more parties want to exchange data, and use it to build value for themselves, for instance when interacting in a marketplace.

Who owns my viewing history on Netflix? If Netflix really has learned a useful representation of my tastes in TV shows (something that has become more likely now my account is no longer used by three teenagers) I might find it super useful to port that knowledge from Netflix to Binge to Stan. And any leg up in overcoming the cold start challenge of generating useful recommendations for new users could be super helpful to new entrant content providers trying to make a space for themselves in the online entertainment industry.

If you are the supplier of inventory management software and I’m a fintech offering a low cost transaction account, who owns the data about our mutual clients inventory fulfilment patterns? Them? You? All of us? Can you impose restrictions on my use of that data even after it flows into my software?

Who owns the data on the paper receipt you are still often handed when you buy something over the counter? You? The store owner? The supplier of the goods itemised on the receipt? What about the digital version of a receipt in an online transaction?

These are real questions and, such is the commercial value not so much of data but of the behavioural and economic information certain sets of data contain, they are hard fought, legally murky and often really complex and use case specific questions.

If you purchase goods online from Amazon, you might have noticed that about three years ago, they stopped providing you with an itemised receipt of purchase via email. It’s still available if you log into your Amazon account but the confirmation email only contains the total purchase amount. Why? I’ve never seen an official explanation but a compelling speculative answer is that they wanted to stop sharing valuable purchasing habits data and SKU level inventory movements with Google and Microsoft, who own and operate the widely used email services, Gmail and Outlook.

Folk often reach for anonymisation and aggregation of data to resolve or at least soften ownership questions (just take a look the term and conditions of any online service you’ve signed up to recently). This can help if you looking at the question of ownership with an individual vs company lens. Without actually helping at all with the conflicts of company vs company, where the patterns of behaviour of large groups of users can be precisely what is valued.

As has probably been true for clauses in many areas since the advent of legal contracts, data ownership clauses may or may not be actually enforceable. But they can act as powerful intimidation tactics, particularly in a situation where the economic might of the two partners to a contract is substantially different. As a colleague of mine once described it, these clauses can come to feel like a strategic game of ‘chicken’, perhaps as bargaining chips within a negotiation. They can also feel like ticking time bombs - ignored until the market power of one party increases to the extent that the more powerful party feels some heat and wants to slow down progress of the ‘upstart’ with protracted, expensive legal battles.

No easy answers but the biggest message is that these are not questions to be ignored. In my experience, plotting a good middle ground during contract negotiations that involve data exchange requires legal expertise, technical expertise and intimate current knowledge of the company strategy and risk appetite. When negotiating a new contract, collect together as small a number of people as you can who cover all those areas but leave one or more of them out at your peril.

Don’t be Carta

For a concrete recent example of contested data ownership, let’s look at Carta. Carta, whose core business is selling capitalisation-table management software to the VC industry, was all over the front page of many business and technology publications in January for all the wrong reasons.

If you think about the data they collect as part of their primary purpose - who owns what share of which rapidly growing companies and on what terms - it doesn’t take much of a pause to see how very commercially interesting that data would be to many influential people, and how very commercially sensitive it would be to the primary ‘owners’ of that data.

Using that highly sensitive data to create a side business in industry insight seems to have passed the sniff test. I subscribe to their market update emails as I’m sure do many thousands of others.

A side business in targeted secondary share sales however might easily have sunk the combined business model of the company completely.

Roughly 72 hours after a prominent startup customer complained that Carta was misusing information with which it was entrusted — scaring many of Carta’s tens of thousands of other customers in the process — Carta is exiting the business that landed it in trouble with the customer.
Carta co-founder and CEO Henry Ward posted on Medium tonight that: “Because we have the data, if we are trading secondaries, people will always worry that we are using the data, even if we are not. So we have decided to prioritize trust, and exit the secondary trading business.”
It’s a dramatic turn of events for 14-year-old Carta, which originally focused on cap table management software but began over time to evolve into a “private stock market for companies” to take advantage of the network of companies and investors that already use its platform and into which it has insights. The big idea was to become the transfer agent, brokerage and clearinghouse for all private stock transactions in the world.
While the move made Carta more valuable in the eyes of its venture backers — a company has to scale, after all! — it put the company on dangerous footing after Finnish CEO Karri Saarinen posted on LinkedIn on Friday that Carta was using information about his company’s investor base to try to sell its shares to outside buyers without the company’s knowledge or consent.
Carta exits secondary trading following credibility hit, TechCrunch, Jan 2024

And it’s hard to read that co-founder quote ‘we have decided to prioritise trust’ without wincing. Would be fascinating to know which use of data was being spruiked as the ‘killer growth engine’ in pitches for recent capital raises. The whole TechCrunch article is well worth the read.

So what? Implications across the lifecycle of data and your business

There is a lot more to say on this topic and the devil is most definitely in the detail. Here’s a non exhaustive bunch of dimensions to ponder further as you craft your commercial and technical data strategy, or revisit the one you have.

Write mindful terms and conditions
Choose your company position on the reuse and repackaging of data carefully (for instance, would you ever consider selling usage data or user specific data?) and make it clear, in plain language, in your terms and conditions. Changing these is always a pain so if, for instance you intend to use usage and transaction level data to improve your own products then say so. Can they opt out of this usage? Be careful and be clear.
Write good data contracts
Sometimes, perhaps inevitably if you are a minnow, you’re going to have to sign data exchange contracts that preclude your usage of data to build or improve your own products. Make sure all of the negotiators on your side of the table are aware of the commercial implications of those restrictions. Make sure the implementation cost of quarantining data by customer source (for instance) is technically feasible and auditable (Particularly tricky if you suck data out of operational systems willy nilly for use in disparate marketing and analytical systems. Which let’s face it, you do.)
Consider regional differences
If you’re a small, plucky and rapidly expanding multi-national, you’ll likely have different dominant competitors in different geographies. Double check you can implement any onerous data contract usage restrictions in a geographically aware fashion so you impact as small a subset of your customers as possible. Almost invariably, this is going to add complexity to your codebase. Is the contract still shaping up as commercially worthwhile?
Consider regulatory islands
Make sure all parties to the contract negotiation are operating to the same rule set in terms of regulation. Don’t assume that the contract as written will pass muster in all of the jurisdictions you operate in. US and EU laws for instance have substantial differences. Again, this can add complexity to your codebase that increase implementation costs.
Consider AI model applicability impacts
Say for instance that you agree to quarantine the transaction data from a large subsegment of clients that you share with another party. As you have given up your right to use that data to improve your own products and services, you can no longer use that data to train any models that are used either in your back office business processes or in your customer facing AI. Annoying for sure, but potentially insidiously impactful if, for instance, the two user groups: ‘users I can train models on’ and ‘users with all usage data off limits because they are also customers of X Corp’ have distinctly different demographics. Now you might be in the position of delivering a degraded customer experience for a subset of customers who you are super keen to impress and retain.

Till next time

As always, thank you for reading and I would love to know your thoughts. Either on topics in this article or, if you’re a subscriber (and if not why not?!) topics you’d like me to dive into in upcoming musings.

If you are observing Easter, or just enjoying chocolate and time off, I wish you a peaceful and joy filled time. I hope you each find a least a moment to pause and bring more happiness into your busy life. I shall as usual be at the farm, trying to keep small trees alive through an extended dry period that will hopefully not develop into a true drought.

Data Runs Deep is now read across 25 countries which I’m chuffed by. A shout out to my two subscribers in Romania!