Investing in Gen AI startups
What's different and where to focus your due diligence before investment and your support afterwards
Last week I had the pleasure of attending and speaking at Inspire24, an in-person event for 250 engaged and enthusiastic founders, VCs and angel investors from across Australia and New Zealand. A challenge for all of these folks right now is the enormous pace of change in the Generative AI space. Startup markets are crowded, new models pop into existence seemingly every two weeks or so, no one person could possibly keep up to date with the research. Where are startups going to have a chance over the established, deep pocketed players? What sort of business model will actually turn a profit, even at scale? We chatted about all of this and more. Below are some of the points that seemed to resonate with the broadest audience.
Does every startup today need to be a Gen AI startup?
Sound like an odd question? Long time readers will be familiar with my opinion that most companies don’t need an AI strategy. What they need is a business strategy that is informed by AI.
(Heavily informed and indeed potentially underpinned by AI in some markets and industries for sure but I’ve just not seen great results with companies that have tried to tack an ‘AI strategy’ on the side of an already crowded business strategy.)
Similarly, as my host Greg Miller suggested, today every startup should take a close look at the emerging productivity focussed Gen AI tools and make a conscious decision to use them or not. And they should be able to explain that decision to potential investors.
When you’re small and running on tight budget, it’s pretty compelling to be able to lean on Gen AI for support with everything from research to drafting a sales script or generating marketing copy. Jumping into too many paid subscriptions might not be the frugal choice for a cash constrained startup but for the foreseeable future, if you’re prepared to do a bit of work yourself, there’s a free option of most things.
As always, it will be important to take a very careful approach to sharing data with the services. Yours and that of your customers / prospects / users. Given the rapid pace of change and the fact that ‘permission resetting’ seems to be a regular occurrence with new model releases, I suggest that founders set a reminder to check at least monthly what’s being shared with whom and for what purpose. And given both hallucinations and the beta nature of all these services, keep it all on a tight rein with human eyes across all generated content, particularly the output of free services.
For investors, it could be helpful to think in a Gen AI depth hierarchy.
Today almost all startups should be ‘users of’. I’d argue almost none (and none that you’re likely to run across looking for funds in Asia-Pacific) should be ‘builders of’ (see section below).
The rest of this blog focusses on dimensions to think about for those who are or plan to be ‘builders on’.
(Aside 1: You would think that generating the simple diagram above would be child’s play for Gen AI. But nope, getting all that pesky text right is still beyond the capabilities of all the ones I have access to.)
(Aside 2: There is also the category of ‘builders for’ if you think about services like Unsloth and Together.AI . A super interesting category but beyond the scope of this blog today.)
A new foundation model? That seems unlikely
Two years from now, will we have consolidated to a handful of foundation models or splintered into a varied zoo?
The technical jury is still out and I’ve heard reasonably convincing arguments both ways. My bet right now would be on towards few, but certainly not one. Not even one per mega vendor, because in two years time foundation model calls will likely still be expensive at scale. Not all tasks will require the same level of machine expertise and it will be sensible (economically and from a climate impact PoV) to pick the simplest and cheapest model that will work for your specific need.
But there won’t be many companies making money from selling access to their proprietary model and to differentiate in this space you will need access to a lot of computing power. It is also increasingly likely you will need to sign some pretty hefty licensing agreements with content providers.
It would also be a big drag on resources going forward to keep a new foundation model even vaguely current. In almost all cases, the smart solution is going to be a ‘builder on’, building on top of a big, reasonably stable model (open source or vendor supplied, third party hosted or in-house) and pay a lot of attention to staying model agnostic enough to switch (which is easy to say and quite hard to do).
So for AU/NZ based VCs, if a startup appears to pitching that they are going to train a new foundational model from scratch, i.e. they plan to be a ‘builder of’, I would get super curious. You may simply be mishearing them so if nothing else, this is a great opportunity to build a shared vocabulary.
Even doing significant amounts of supervised fine tuning (SFT) over top of a standard foundation model is a reasonable amount of work for a small team (all that labelled data!) and for more subjective tasks, the application of reinforcement learning from human feedback (RLHF) requires, well, humans. Human experts in a particular domain can get expensive fast at AI scale.
To be clear, both SFT and RLHF fall into my category of ‘builders on’ and are certainly feasible for AU/NZ based and funded startups.
What’s the moat?
It is hard to see model capability alone being a long term defendable moat even for ‘builders of’. It is right now given the cost of compute, data and talent. But erosion will come.
Talent is mobile and improvements to model and training architectures are being copied pretty quickly, reducing the depth of the talent moat over time. There is currently a lot of effort focussed on optimisation (algorithms, software and hardware) which will reduce the compute moat. In a funny way access to data might be the most enduring constituent of the model capability moat - because you need a lot, because it costs a lot to sort through that data and only use the best bits and because access to the data may start to cost a lot of money depending on where copyright law settles.
In any event, even now, a model capability moat is only available to a tiny number of very well funded companies (the ‘builders of’ discussed above).
Owing the workflow and owning the customers remain the most obvious enduring moats.
With that lens, it was interesting to see the recent Microsoft announcement about PCs with on-device AI compute capabilities. Smart for reducing latency, smart for privacy. Super smart if they can finally reverse the trend of ‘everything in the browser’ and rebuild device lock-in. And device moats are possible in other spaces, particularly those with specialised hardware e.g. mining, energy, healthcare and agriculture.
Owning the workflow can also lead to another kind of data moat when a company has access to a lot of human action behaviour. In the best case scenario, this high quality ‘label’ data can be used to bootstrap a custom model which gets good enough at a few specific tasks to enable heavy automation of a traditionally boring but time consuming human task. Dramatically drop the cost of the ‘outcome as a service’ because you can get more done with fewer humans. More data flows in as you get more customers. Repeat. This has been a strong play for the past decade of ‘traditional AI’ and in some niche task markets it should continue to be so.
Finally, a new moat may emerge which is the ‘not created by AI’ moat. Substack is an prime example.
What makes the idea hallucination proof?
Gen AI has wowed us all with its literacy and apparent broad adaptability. But what we see now is some very impressive engineering sitting on top of 10, 20 and 40 years of research (depending on the base technique you’re thinking about). So it is amazing, but it isn’t fast and amazing.
I see a lot of commentary about future trajectories where the commentator is clearly thinking ‘well if you can do this in two years and I extrapolate that pace of change forward … the change in the next two years will be beyond our wildest dreams!!’. But that isn’t based on an accurate assessment of the past two years and savvy investors need to factor in the strong possibility of a capability plateau.
To dig into a particular example, I can’t bring to mind anything that stacks up as a solution for hallucinations. Retrieval augmented generation (RAG) helps but it hasn’t made the problem go away. That’s a sobering lack of progress in two years on a nasty little ‘feature’ that significantly restricts the viable use cases for Gen AI.
So I'd be super cautious about investing in a startup with a business model that depends on the emergence of a fix. The use case needs to work even on the assumption that hallucinations will persist at their current level. And that means workflows that accomodate the regular appearance of nonsense presented as fact. Absolutely possible for some things, dead in the water for others.
Building on shaky foundations is a given. How will they minimise the impact?
For aspiring ‘builders on’, it’s easy to pull together an impressive demo of ‘service X’ using a commercially available LLM. It is much harder to transform that demo into a useful product running reliably at scale with real humans as users. That general pattern has been true for a long time. Gen AI amplifies the divide by making the demo easier and more impressive and making the useful product harder to achieve.
To build a complete product, you will have to wrap a lot of ‘traditional’ software around the LLM model calls. Your development team is new to this type of tech and the tech itself is new. ALL of the frameworks, tool, libraries and development patterns to support this development are new and rapidly changing. Whatever weighting you would usually place on the presence of a strong technical co-founder, dial that up significantly for any business with a heavy dependence on ‘building on’ Gen AI. Moving at speed while navigating the inevitable refactoring needs a realistic view of the real work all the way from the top.
Are they thinking hard about evaluation?
If not, they should be, so I would ask a lot of questions about that. Moving quickly is always important to startups. But you need to move quickly with intent and know that you are making progress.
Evaluating the ‘quality’ of a AI service in your product offering has always been a tricky proposition because the dimensions of what quality means can be hard to pin down and are often in tension with one another (improvement on one dimension must be traded off against deterioration in another). This has become trickier still with generative AI, something that is easy to appreciate when you think about the subjective nature of assessing quality of language or images.
A savvy team will always be dissatisfied with their evaluation metrics. But they will have them and they’ll be prepared to invest time in creating and maintaining the ‘ground truth’ data + label sets required for repeated and repeatable evaluations of their currently agreed upon quality standards.
Where’s the talent?
A lot of people have remarked to me that there are suddenly a lot of ‘Gen AI experts’ sloshing around. Figuring out who the real practitioners are is tough but key to success.
For startups and their investors, talent considerations fall into two buckets that I would consider separately.
Does the founding team have expertise in ‘building on’ generative AI services? Only PoCs or full production systems? Do they have past experience as ‘builders of’ early iterations of related techniques like deep learning? Natural language processing? Vector embeddings? Similarity search? Production grade systems that were retrained and redeployed? Or only models built and executed from notebooks?
If the collective answers to the above comes out a thumbs up, that’s an enormous help to your second bucket - finding, growing and retaining folks with the experimental design, algorithmic and engineering skill to be ‘builders on’ an appropriate foundation model. While it is always true that talent attracts talent, it’s very strongly true in generative AI capable teams right now.
Till next time
Winter is upon us in Australia and hurray, the farm got 37mm of rain over the past week. That means planting season is here and I shall be busy finding homes for roughly 200 happy little seedlings - red box, yellow box and the rather dashing silver princess. Where ever your week takes you, I hope you have time to walk under some trees and enjoy the show.
Photo by Greg Rakozy on Unsplash