The development of AI development in 2024

Just 14 months ago, one of the most powerful tools ever created by humans was made available to everyone with an internet connection. It was pretty cool for a week or two, but very quickly became a part of everyday life - at least that was the case for me. I’ve almost forgotten the multi-hour sessions franticly clicking through 34 different tabs trying to get some piece of code to work. So with that in mind, I thought it would be a good exercise to reflect on the past year, and put forward how I’ll be tackling this area for the year to come. My focus is on applying the available AI technology to real-life use cases.

Is AI everywhere?

Artificial Intelligence, more specifically generative AI and Large Language Models, are just another part of everyday life for some. I say some, because you may be surprised how many people and businesses still don’t know what ChatGPT is, or how it’s any different to Google. Working in software can put you in a bubble sometimes, and I think it’s important to remember some businesses are still getting their heads around Web 2.0!

Having recently spent time with a few such businesses, there’s no shortage of problems to solve (will there ever be?) and I like to think of the work I do as bringing these two worlds together. Find a problem or way of working that’s wildly inefficient, then adapt the best technology on offer to that specific case. To some extent, the delta between the two represents the value you’re able to provide.

Take Loom’s recent AI for example. A handy addition to an already good product, but they’re only charging $4/m. The reason I pay $20/m for Loom is that it saves me 30 minutes every time I want to share a video (recording, editing, uploading, sharing) plus the 30 minutes it would’ve taken to write up a detailed guide. Auto-generating the video title and drafting an email saves me 30 seconds. ¹

On the other hand, industries such as transport, health care, and finance are full of time-sucking tasks. I recently read the results of a survey that spoke of processes in the tens of hours each time. Think tedious reporting writing or manual data entry. Applying the same “Loom AI” approach here could provide time savings in hours rather than seconds. That’s why these industries are willing to pay thousands each month for such a solution.

Everyone wants to build the AI…

But not many people want to build the UI. I can understand why though - spending the time wrangling CSS can prove difficult, and for what feels like minimal progress. You just want to move on and build the functionality of a feature, not figure out how to make a nested div scroll. But from the few AI native products I’ve built so far, a similar situation appears.

Enter the field of prompting. With business-critical applications, it can be like trying to debug a website without the Chrome developer tools while your tailwind config changes on each hot reload. In other words - also frustrating. Let’s think about everything we’re trying to do in a prompt. You’re essentially taking a probability machine that predicts the next best word, cramming all the business logic, instructions, examples, user inputs etc. and attempting to get a valuable and accurate output.

If that’s all you did, you’ve essentially built a prompt template library people could use with ChatGPT. Good luck charging for that. The real value comes when it’s seamlessly integrated into an existing workflow or platform. Where the data is already there, or there’s feature parity with an existing product that makes the move worth it.

Beyond chat

I believe the limiting factor with generative AI right now (in terms of wider adoption) is an interface problem. We’ve seen a similar pattern over the last 20 years where a lot of products and software were simply abstracting away the complexity of databases, or presenting data with more specificity. To successfully achieve this at scale required design from a technical and user-facing perspective.

The same can be said for this new era of generative technology. Building a chat interface is the database equivalent of presenting a user’s data in a pretty table. There are times when I need that in a product, sure. And no doubt there will be times I’ll want to converse with AI. But there had to be more than just chat. It needs to be integrated - surfacing insights just as I go to look, and doing grunt work behind the scenes while I get on with more important tasks.

Tools and function calling are great advancements in this space. Everything I’ve built so far is taking advantage of this, even just to extract structured data in a reliable format. Combined with the new Assistants API we can start to build out some more advanced, longer-running, multi-step workflows with ease.

With that in mind, I’ll still be refining my UI and UX skills this year, and remain on the lookout for what the next “interface” could be. Right now web and mobile are the default due to widespread adoption, but something else will take their place eventually. At the end of the day, it’s all just different ways of stimulating human sensors to present information and allowing humans to input information with ease. Screens and keyboards are pretty inefficient when you think about it.

Letting the waters settle

I’ll admit, I haven’t spent as much time with this new technology as I would’ve liked, but I also don’t think that’s such a bad thing. For one, there’s the usual hype cycle that goes around, and I’m not one to jump head-first into something just because other people are. I’d rather explore first, experiment (and make mistakes) on side projects, and then apply it to the right use case when the opportunity arises.

Something else I saw the other day was the rule of 10/10/10. What you build now will be obsolete in 10 months, take a 10th of the time to be built by someone else, and with a 10th of the resources.

I recently experienced this with a client project. We started when the only model available was GPT 3.5. The 4k token limit amongst other restrictions meant setting up a lot of supporting infrastructure. Job queues, token limiting, rate limiting, chained workflows etc. just to get things working. And the results were… meh. Halfway through we got access to the GPT 4 API which significantly improved the response quality, and the release of a 16k context window for GPT 3.5 Turbo which we used for pre-selecting and processing the context for the main prompt. Then, just as I was wrapping things up, the 100k context window for. GPT 4 turbo arrived, making half of all that work redundant. Oh, and the Assistants API? That made the other half redundant.

But that’s the name of the game. It excites me.

Is my app safe from ChatGPT?

Well, eventually AI will be able to do everything a human can do (and much better). So long term, probably not. But will it be OpenAI and ChatGPT specifically who make your app redundant? Potentially not, at least in the short term. I’d argue you have a much better chance of making your own app redundant with AI. Even if it’s AI-focused.

Remember, OpenAI is focused on building generalised intelligence.² And it’s a similar story with some of the other big players. My take on that is Large Language Models potential will keep improving, but it’ll be up to the market to harness that potential into something specialised or industry-specific. I imagine the landscape will be similar to SaaS products today, with varying degrees of focus. At one end you have the do-it-all all style of Notion / ClickUp, while the other end is a more tailored approach with opinionated design and guardrails. I’ll be spending time this year exploring the latter.

The other way to look at it could be as a human. I’m not quite sure where the level is at the moment - maybe an early teenager who’s swallowed Wikipedia? Eventually, they’ll graduate high school and maybe university, then they’re off into the world. Thanks OpenAI! But they’ll still need to pick a job, learn the ropes, spend time around expert, build a network, and refine their craft to become irreplaceable.

Areas I’ll be following

Trying to integrate agency into these LLMs has been a fascinating experiment. Setting up different “agents” each with their own role, and having one of them try to orchestrate the whole thing. The challenge I find at the moment is trying to find the balance between independence and chaos. You don’t want your instructions to be too rigid, rather explain the destination and let them get there themselves. But again, it’s a hard one to test as the feedback loop for each iteration is 5-10 minutes long. I’m keen to see what people come up with here.

Another area of interest is an emulation of the human psyche in these models. How do we, as conscious beings experiencing the world, think? How do we work through problems? How do we develop intuition? How do we reference memories? And are the ways we do things the best way for machines do to them? These are all questions I believe are worth exploring in depth.

And of course, progress on large-scale problems such as super alignment is always worth a read. I believe the work done in that space will have the largest impact on our future long-term.

The pace things are moving is rapid, especially seeing this technology has only been in development for a handful of years. But where do you draw the line? Technically humans have taken billions of years to get to our level of intelligence right? All the different forms of life that lead us to this point. It begs the question - how artificial is this artificial intelligence? Is it separate from all the levels of intelligence prior, or just the next evolution? Personally, I believe we are just the biological boot loader for the singularity. But that’s next year’s post. I hope.

Footnotes

While I’ve used Loom as the comparison here, I actually think they have one of the best implementations I’ve seen of useful AI tools to enhance their core offering. But this only strengthens the example. ↩
This will likely be short-lived, and quickly move into a form of superintelligence. But that seems like a topic for another discussion, so let’s just assume we’re talking about a world between now and then. ↩