What does Gen AI do to the role of CTO?

The role of the CTO has changed a lot in the last 20 years. It’s about to change again.

Jul 05, 2024

I don’t subscribe to the idea that Gen AI will transform the world of work beyond recognition in the next two years and I don’t think the robot overlords are coming any time soon.

However I have been pondering the changes that more extensive use of Generative AI will bring to organisations and to the role of the chief technology officer in particular. This ponder was partially sparked by a conversation with a recruiter who suggested my profile was ‘too AI for a CTO role’.

This first made me laugh (hopefully not out loud) and then made me wonder if she didn’t rather have that round the wrong way? I have to understand how to build software in order to build AI-powered software. The reverse is definitely not yet true. I know many CTOs who, in 2024, have only a fledgling understanding of how to build software with a generative AI component.

So what does the road ahead look like? What areas should CTOs with a more traditional application software background lean into to learn more?

Photo credit to Joshua Sortino on Unsplash

Data and data flows

Yes, data has been around for ever. But there is a world of difference for engineers who grew up with the mindset of CRUD (create, read, update, delete) and the ‘flows, change and multiple uses’ mindset that building products on top of data requires.

To speak in broad generalities, good data scientists and data engineers LOOK at the data. In my, slightly horrified but oft repeated experience, a lot of software engineers from more traditional application backgrounds just don’t. Extending that to a CTO, spinning multiple plates and sliding ever further away from a terminal and an IDE … I’m guessing that the last time you looked at actual data, all the date stamps might still have read 201X.

If you want to bring yourself back to an appropriate and meaningful coal face very quickly, set aside 30 minutes a week over the next month to actually look at the data that your organisation collects.

Can you get access? Is there an intra day up-to-date shadow copy or do you have to touch production data? What fields are masked? How many different toolsets did your team suggest you use? If you asked five people from five departments / teams did you get any consensus about where to look and what you might expect to see? Where do you store your ‘interaction’ data i.e. the data about users using your software? Can you bridge between the interaction data and the application data?

Sure, this isn’t your job anymore. But that’s my point. You are likely disconnected from a newly critical technical area that will make or break your efforts to integrate Gen AI.

If you have consultants in that are telling you your internal data doesn’t matter any more and data quality is a thing of the past, fire those consultants. You can build a PoC without paying attention to the coherence and freshness of your data. But that’s it. Data and AI products expose edge cases and wooly logic like nothing you have ever built before. Ground yourself in the reality of your data today before you spend money chasing a 2% uplift that needs an 80% rebuild.

Probabilistic workflows

CTOs who grew up building application software are used to being able to write (pretty much) provable ‘right’ software. And so are 95% of today’s software engineers. Machine learning in general and generative AI in particular messes with your head if that’s your paradigm.

It’s heartbreaking to see great software engineers run a handful of queries through <LLM of choice>, tweaking prompts until they see an example or two that they like. And then call that done. So. Not. Done.

This under discussed paradigm shift is leading to development time lines blowing out and contributing to the rash of very undercooked products we are seeing. Why do Gen AI products often feel so very beta? Development that includes probabilistic components needs new patterns!

What can you do to push yourself and those around you to think differently? One trick I’ve found helpful (and it continues to work even when you know it’s a deliberate tactic!) is to flip around the metrics that relate to accuracy. Instead of saying, this component gives good results 83% of the time, train yourself to say, this component gets things WRONG 17% of the time.

It is of course the same thing but thanks to the loss aversion hard wired into our primate brains, it feels bracingly different to talk about a workflow breaking 17% of the time. Not because it hasn’t been coded correctly or tested thoroughly enough, but as an inherent limitation of the technology you are choosing to use.

This is also going to help your teams take a hard look at when the use of a Generative AI component is truely useful and when it isn’t.

Evaluation

As a CTO in 2024, you need to learn more, much more, about evaluation. If you are building Generative AI into your products / services in 2024, you should be putting as much development effort into evaluation as you are into everything else combined. Evaluation of how ‘well’ a Generative AI based service is meeting your end user needs is hard. It’s an active and heavily discussed area of research and nobody with serious intent believes they have it fully sorted out.

Enjoying the read? Please share it with someone who needs to know.

There are two broad reasons why I think you need to concentrate so much attention and effort into doing evaluation well. Let’s look at them sequentially.

First is to allow change. Machine learning systems of all kinds are inherently very, very tightly coupled. This spans from the data used to train to every step of the processing chain (tokenisation, vector representation, prompt, fine tuning parameters, etc, etc). Even with traditional ML systems, I’ve seen way too many teams walk themselves into a situation where they cannot make any further change because they have no visibility over the cascading impacts of that change. If you don’t have robust, repeatable steps with statistically relevant amounts of examples to create agreed upon metrics you will rapidly discover that you are making changes blindly and, after the first big snafu, that your expensive new system is effectively frozen in time.

Second is because a ‘right’ answer is easy to talk about in broad brush strokes but hard to pin down in a measurable way. The promise of generative AI is to be able to give us answers to very fuzzy questions. (If you want to get a better gut feeling for what I mean by this, try using <LLM of choice> to generate useful SQL using natural language against any meaningfully sized data schema. You will be reminded very quickly what an imprecise and ambiguous language English is!) Because of this though, we’re pushing right up against the boundaries of being able to cleanly articulate what a good answer is. Requiring a robust evaluation process and making this a core deliverable of your development team NOT a side issue pushed off to ‘testing’ or ‘quality control’ will ensure everyone involved confronts the messy, imprecise nature of ‘right’. Leading to better workflow and user experience design and fundamentally shaping your product development!

‘Evaluation by vibe’ was a bit of a meme at the beginning of 2024. Don’t be that team.

And the rest

While I’ve illustrated the above dimensions with examples that imply you and your team are ‘builders on’ generative AI foundation models (see this previous post), digging in and getting a better understanding of data & data flow, probabilistic workflows and evaluation will be helpful even for CTOs who only intend to be ‘users of’ Gen AI. A better understanding of how the sausage is made and all the potential weak points will help you have better and more informative conversations with your vendors and build partners.

This is such an exciting time to be a technical person in tech whether your fingers are on or off the keyboard most days. I hope the above is useful in directing your self study as you gear up (again!) to learn the new tools and unlock the joy of building in a ‘peak churn’ year.

Data Runs Deep

Discussion about this post