Hand drawing showing the words Web and SLM, with SLM being in a computer chip

A language model in your browser, accessible as a Web API?

AI, these days, is all about cloud services. You chat with a bot, you ask a voice assistant for something, etc. All of which happens to be powered by some AI model that's running on a server somewhere. But on-device AI is a thing too. It's just maybe less popular, less well-known. On-device AI refers to running AI models on the same device where the user is consuming the result of the model.

Running an AI model on your device comes with important benefits:

There's nothing very new here, and many apps, and devices already come with AI models built-in. In the world of web browsers, it's worth noting that some things are starting to happen:

When it comes to web developers getting access to AI models, it still comes at the price of downloading your own models, and using frameworks such as ONNX web runtime to run them. What if there was a simpler way?

Simpler on-device AI for web pages?

That's what Google has started to experiment with. At Google I/O earlier this year, they announced their "built-in" AI plans, which involve shipping a small language model (Gemini Nano) as part of Chrome, and making it available to web developers as a series of Web APIs. More recently, they published an explainer document for their Prompt API, which goes deeper into the details of how this works.

The main benefits of this approach, over what you can already do with WebGL, WebGPU, and WebNN, are:

The drawback I can see far is that you don't get to choose the model.

Now, this is all very new and experimental. The only API that Google has made available so far is the prompt API, which allows you to access the model pretty directly. Here's a short code snippet showing the intended usage of the prompt API:

const session = await window.ai.createTextSession();
const result = session.promptStreaming(inputText);

for await (const chunk of result) {
textarea.value = chunk;
}

You can already use this today in Chrome Canary, by first enabling the chrome://flags/#optimization-guide-on-device-model and chrome://flags/#prompt-api-for-gemini-nano flags.

The risks

Exposing a language model directly to web pages is certainly exciting, but also comes with risks:

What do you think? I'm personally reassured by this part of the explainer document:

Even more so than many other behind-a-flag APIs, the prompt API is an experiment, designed to help us understand web developers' use cases to inform a roadmap of purpose-built APIs.

Purpose-built APIs could be very useful to web developers, and would come with far fewer risks attached. Imagine being able to call summarize(text) or translate(text, "en-us", "fr-fr") from your webpage.

Feedback

At this early stage, if you have feedback about any of this, I think the most useful thing to do is to engage with Google by submitting an issue on the prompt API explainer repo.