This is part four of a four-part series on the many dimensions of ‘AI inside’ product development and product management. Given that it’s far from a comprehensive look at a young and rapidly evolving space, we may be back here in the future. You can read this post out of sequence but if you want the full picture, check out the first, second and third instalments.
Thanks for reading Data Runs Deep! You make my day. If you’re enjoying it, please share with others who might enjoy it too.
Most models are surprisingly task specific
It would be really great if you could take the training and label data from a given small, routine subtask, train up a great model, get lots of value from it, and then shift it sideways to also answer a related but different question. Unfortunately this is unlikely to work.
While the training data will quite likely be useful in multiple builds, the label data isn’t as transferable. This is fairly easy to understand when you stop and think about it - we’re training the machine to perform one very specific task to a high fidelity and this works best when the labels are very strongly aligned to that specific task (read ‘they were created by humans doing EXACTLY this task’).
As is often the case, the reasons this expectation tends to arise is that the humans involved aren’t asking precise enough questions to clearly define workflows and are actually blurring the lines between two similar but distinct problems - it looks the same at first blush but it’s actually a subtly different task. Another reason to start any AI product build with a robust process of writing down and talking through all the assumptions that need to be true to build something valuable.
We’re beginning to see this in LLMs as well. Even though some hopeful folk were proposing that perhaps GPT (Generative Pre-trained Transformer) was very cleverly named because it was in fact also going to become a General Purpose Tool, there’s growing evidence that you need to tweak LLMs to be ‘good enough to be useful’ at one thing - and when you do, they’re really not so great at doing this other thing any more. One to ponder as you embark on your Gen AI product journey.
Products with embedded AI are fragile
And that makes them expensive to maintain. If ever there was a product that is NOT set and forget, it’s a product where one (or more) sub component(s) is a model trained on data.
Data inherently drifts. This can be a simple as dates changing as time passes - sounds trivial but surprisingly important if your product relies on recognising dates and isn’t familiar with any dates beyond 1/1/2022. Or it can be as multi-factorial as climate change, competitor offerings or economic conditions invalidating your ranges and distributions.
You will also need to stay on top of the contracts for any third party data you might be using to augment your model build. Data in general and labels in particular are not cheap to produce and curate. Companies selling data come and go, the quality of their curation and the completeness of their data set will ebb and flow. Be wary of pricing that seems too low to support the amount of work done to assemble and clean the data - if it seems too cheap, it probably is and your data provider is likely not yet cashflow positive / has not yet hit on a sustainable business model.
So you’ve trained a useful model? Good start but that’s just the tip of the iceberg
When you’re planning the lifecycle maintenance of any software product, you need to carefully consider the support requirements. When you plan the lifecycle maintenance of an AI inside product, you need to factoring in stringent drift monitoring and a regular retraining cycle.
I’ve seen far too many teams miss the fact that a model once trained, will need to be retrained. Academically they know this, practically the funding / enthusiasm runs out after what is often a super hard push to launch.
You need an architecture that embraces training and inference AND you need the training pipelines to be reproducible with an updated data set at a future point. Funnily enough, it’s actually the models that need to be trained less frequently that suffer the most from this problem. It’s a lot of work to build robust and reusable data pipelines and to keep them up to date and operational as versions changes and patches are (or aren’t) made of various parts of the training stack.
If you train every week (uncommon outside the majors), you’ll be incentivised to keep your training stack current. If you train once a quarter or once a half (not unreasonable with slowly shifting or seasonally aware business problems) you will have to be a lot more disciplined to keep the training stack and team knowledge fresh to avoid retraining becoming essentially a rebuild.
And we are through
If you missed them, you can catch up on the first three posts in this series below. We may return to other dimensions that you need to keep in mind in the future but that’s all that’s striving to make it out of my brain and on to digital paper at present.