We lack a shared vocabulary for building AI powered products

and that's slowing down the build and release of value

Sep 15, 2023

If you're involved in building products that involve AI, are you confident you're not talking past some of the many stakeholders involved in the process?

A few weeks back, I spoke at the DataEngBytes conference about the realities of building AI powered products from scratch in the data rich but often context poor environment we find ourselves in today.

Rather than speak about technical specifics, which change rapidly, I focussed instead on what I've found a constant and often unacknowledged challenge which impacts the very young and very complex field of AI product build particularly hard - the lack of a shared vocabulary between all the practitioners and stakeholders who need to be involved in bringing safe and useful AI products to market.

No, I’m not saying this problem is unique to building products that are powered with AI

However, I do think that the complexity of the functioning product and the ethical complications of products that use patterns learned from data to take actions, both expands the stakeholder group unusually broadly and increases the likelihood of negative consequences when things are buried in misunderstandings and passed over.

It’s ironic that machine translation offers translation between so many human languages … but it can’t do what we really need to do, which is to bridge differences in vocabularies.

So I recommend getting good at spotting lack of awareness of lack of knowledge and it seems to come in two types

😣 you don't realise you have a knowledge gap
😣 you don't realise those around you have a knowledge gap

As noted above, the breadth of people involved in a successful AI product, makes this second problem particularly acute. From machine learning engineers to board members, security engineers to legal teams, statisticians to UX specialists.

It takes a village to make a successful AI powered product

Even professionals in adjacent specialities get confused

Data models are not AI models! Don’t laugh, this confusion definitely exists when folk speak about ‘models’ without qualification. Data models are critical to bringing shared understanding and consistency to AI powered products and many folk involved in the ML algorithmic training do not know anything about star schemas and snowflakes.
Bias is baked into the data used to train. And with no inherently ‘fair’ benchmark to compare your results to, removing bias can be effectively impossible. While this seems obvious as soon as it is stated, for folks who are not involved in the data gathering loop, bias can often seem like something that is clearly unwanted and should just have been removed by the development team!
No AI model can give a guarantee of 100% accuracy so your workflows can’t expect this. While many UX professionals, particularly those with a background in HCI are quite excited about the challenge of designing a user experience that includes the uncertainty in outcome that comes with using AI as part of the workflow there still aren’t that many with hands on experience. I’ve also seen product managers really struggle with this peculiarity of an AI powered product. Again, open discussion up front is key.
History is critical to AI development but not so much to your original software application functioning quickly and correctly. Most of the software engineers working in and around AI powered products started working before AI come widely distributed and hence have a lot more experience working with ‘traditional’ application software. The focus there is on low latency, simple, software that works reliably. Capturing and persisting peripheral data about what the users do and what fields they change when often isn’t important. Very few software engineers have heard of slowly changing dimensions for instance. So your team needs to have very early, very clear conversations about what data is needed and at what level of specificity and coherence.
ML models are very tightly coupled to the task they were trained to do, which is heavily determined by the labels. This means that apparently adjacent tasks may be out of reach without new labels - which can be a very big deal when the ‘apparently adjacent task’ has much more commercial value than the one you can capture labels and train models to execute.

While no single misunderstanding will sink your product idea, the friction that builds up over time makes for brittle products and frustrated colleagues. So my CTA for the audience (and for you!) was to get good at detecting misunderstandings and to seek to be understood when speaking. Avoiding jargon where possible being the first step!

Bookmark those explainer articles

My best tip, particularly for those who don't particularly like writing themselves, is to bookmark and stockpile all the good basic explainer articles you come across. I think many folk overlook the fact that ‘too basic for you’ means a potentially great primer for your colleagues!

Borrow with pride!

Increase both your own ability to explain things well (if you really want to check you understand something, try to teach it to someone right?) and the knowledge base of the colleagues on whom your continued product success may just depend.

Shane McGovern

Sep 17, 2023

Appreciate your insights on peripheral data. Do you believe there's a threshold where accumulating peripheral data becomes indiscriminate data hoarding? How do we balance the benefits of data collection with concerns like information privacy, the rising costs of data processing due to increased volume, and other contemporary challenges?

Expand full comment

1 reply by Kendra Vant

1 more comment...

Data Runs Deep

Discussion about this post