This is part three of a multi-part series on the many dimensions of ‘AI inside’ product development and product management. You can read this one out of sequence but if you want the full picture, check out the first and second instalments.
Thanks for reading Data Runs Deep! You make my day. If you’re enjoying it, please share with others who might enjoy it too.
The implications of the Gen AI rush to market
In just the past few days, Gartner published their Emerging Technologies hype cycle with Generative AI sitting pretty close to the peak of inflated expectations. That feels about right. It doesn’t of course mean that there isn’t amazing value in using Generative AI, just that’s going to take hard work while the market buzz has been promising instant magic. “Ideas are easy, execution is everything" is taking hold and will winnow the field.
Beyond the obvious ‘be cautious of the fly by nights’ adage, I’m actually cautious about the solidity and production readiness of even the Gen AI services being rolled out by the major players. There has been a whole heap of ‘me too’ ism going on, as Google in particularly visibly panicked about being left behind, and there was a lot of vapour ware evident as everyone rushed to have a proposition.
So, be curious and politely skeptical about the solidity of what you’re building on top of whether you want to roll your own or subscribe to a service. Model drifts and changes are already evident, the dearth of GPUs have seen ad hoc ‘computing collectives’ spring up and there is not (yet) any real ecosystem for Gen AI Ops.
Collect the data for tomorrow’s products today
Not specific to Gen AI at all, one of the most common challenges I’ve run into myself and seen other teams struggle with, is having a great product idea but not having all the data needed to build the models needed to make it a reality.
Perhaps you’re missing labels, perhaps your operational systems overwrite dates or don’t capture ‘last updated’ on fields that haven’t been important to BAU product operation but are critical to capturing user intent. Perhaps you’ve never previously needed to capture the order in which items in a dropdown are presented.
While ‘if it changes, log it’, ‘if it can occur, instrument it’ might be too extreme, particularly if you use an event capture tool with a volume pricing structure, it doesn’t hurt to mentally start there and then back off collection by consciously deciding that that action or that change could not conceivably be useful in the next 6-12 months.
I’ve seen too many teams have to sit on their hands for six months collecting one missing piece of data before they can build useful products in earnest.
Note that there is an important caveat here. I’m not suggesting that you intentionally collect and store, invasive, personally identifiable data that your users would not reasonably expect to you to collect or store! User trust is paramount - apply the front page test wisely and make sure you’re not hiding behind T&Cs that are 26 pages long and written in legalese.
Costs to build and run
We’ve become accustomed to think that data storage is cheap and compute isn’t too pricey either. AI products changed that situation and Generative AI has blown it completely out of the water.
I ask development teams to do a back-of-the-envelope calculation of both training and running costs BEFORE they start to build. Sure, it helps if you’re working on your fourth product not your first. But some things are not too tricky.
How much data will you need to store for training the model once? How much will that snapshot change over time in order to keep the model fresh? How often are you likely to need to retrain? How much total data will that be after 12 months?
What’s the ballpark single training cost? How many times will you need to process the complete data set to train once? If you need to use GPUs, what’s a realistic utilisation figure to shoot for? Will you buy spot or reserved instances? Or are you going to rack your own hardware and account for depreciation? (likely not an option if you need the latest chips BTW)
If you’re thinking of fine tuning an LLM on a local corpus or sending context through as prompts in ever call, what will it cost to send that many tokens through your chosen vendor? Can you self host?
At inference time, can you run serverless? Will you need to have dedicated GPUs in multiple geographies? If you have bursty loads, what latency can you tolerate under peak demand? How expensive will it be to achieve that?
What support model will be required? How many teams need to be included in second and third tier support? Will this product add new skillsets to the support team already in place?
It worries me when I see folks who assume that using AI will take cost out of providing their services. That might be true, likely is at scale if you are removing toil from humans through effective augmentation. But it shouldn’t be an untested assumption. The build and run costs are significant and maintenance and support are high. Don’t make the mistake of reduce customer service costs by adding in software engineering and compute and storage costs that are more than equivalent.
For next time
In the final instalment next week I’ll look at
How most models are surprisingly task specific
Why products with embedded AI are fragile (hint, these two are related)
The perils of not understanding that training is only the first step
If you missed it, you can catch up on the first two posts in this series below.