ARI Models Gateway
The public client endpoint is https://models.ari-motors.com. The gateway forwards requests to a private Ollama instance on the Mac Mini through Tailscale. Clients never talk to Ollama directly.
Base URL
Public API
https://models.ari-motors.com
Playground
/
Documentation
/playground/docs
Authentication
Public API calls require a Bearer token in the Authorization header.
Authorization: Bearer <prod-token>
The admin-only endpoint /installed-models requires the admin token.
Authorization: Bearer <admin-token>
The browser playground does not expose tokens. It uses Auth0 session login on the server and internal proxy routes under /playground-api/*.
Endpoints
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | /health | Checks gateway health and whether Ollama on the Mac Mini is reachable. | Prod token |
| GET | /models | Returns the current public model list in a simplified format. | Prod token |
| GET | /installed-models | Returns the installed model inventory from Ollama with extended metadata. | Admin token |
| POST | /chat | Runs chat-style text generation with one of the available text models. | Prod token |
| POST | /vision | Runs multimodal generation with a prompt and one or more base64 images. | Prod token |
What the API can return
Health state
You can retrieve gateway status, upstream reachability, and response time in milliseconds.
Public model inventory
You can retrieve each public model name, normalized model identifier, inferred type, family, and size.
Installed model inventory
Admin users can retrieve model name, model id, byte size, modification timestamp, family, parameter size, and quantization level.
Chat output
Non-streaming chat returns Ollama JSON directly, including fields such as
model, created_at, message, done, done_reason, and timing counters when present.
Streaming chunks
Streaming is passed through without interpretation, so chunks may include
message.content, message.thinking, response, done, and done_reason depending on the endpoint and model.
Vision output
Non-streaming vision returns the upstream Ollama generate response. When streaming is enabled, newline-delimited JSON chunks are forwarded as they arrive.
Request formats
Chat request body
{
"model": "qwen3:4b",
"messages": [
{ "role": "user", "content": "Write one short product summary." }
],
"stream": false,
"keep_alive": "5m",
"options": {},
"format": "json"
}
Vision request body
{
"model": "qwen2.5vl:3b",
"prompt": "Describe the image in English.",
"images": ["<base64-image>"],
"stream": false,
"keep_alive": 0,
"options": {}
}
Vision images are always preprocessed on the gateway: max long side 768 px, converted to JPEG, quality 82 by default.
JavaScript examples
1. Check health
const response = await fetch('https://models.ari-motors.com/health', {
headers: {
Authorization: 'Bearer ' + PROD_TOKEN
}
});
const data = await response.json();
console.log(data);
2. Load public models
const response = await fetch('https://models.ari-motors.com/models', {
headers: {
Authorization: 'Bearer ' + PROD_TOKEN
}
});
const data = await response.json();
console.log(data.models);
3. Send a chat request
const response = await fetch('https://models.ari-motors.com/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer ' + PROD_TOKEN
},
body: JSON.stringify({
model: 'qwen3:4b',
messages: [
{ role: 'user', content: 'Write a short English greeting.' }
],
stream: false,
keep_alive: '5m'
})
});
const data = await response.json();
console.log(data.message?.content);
console.log(data.message?.thinking);
4. Consume streaming chat output
const response = await fetch('https://models.ari-motors.com/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer ' + PROD_TOKEN
},
body: JSON.stringify({
model: 'qwen3:4b',
messages: [
{ role: 'user', content: 'Explain HTTP streaming in one paragraph.' }
],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.trim()) continue;
const chunk = JSON.parse(line);
if (chunk.message?.thinking) {
console.log('thinking:', chunk.message.thinking);
}
if (chunk.message?.content) {
console.log('content:', chunk.message.content);
}
if (chunk.done) {
console.log('finished:', chunk.done_reason);
}
}
}
5. Send a vision request from JavaScript
async function fileToBase64(file) {
return new Promise((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => {
const result = String(reader.result || '');
resolve(result.includes(',') ? result.split(',')[1] : result);
};
reader.onerror = reject;
reader.readAsDataURL(file);
});
}
const imageBase64 = await fileToBase64(fileInput.files[0]);
const response = await fetch('https://models.ari-motors.com/vision', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer ' + PROD_TOKEN
},
body: JSON.stringify({
model: 'qwen2.5vl:3b',
prompt: 'Describe everything visible in this image.',
images: [imageBase64],
stream: false
})
});
const data = await response.json();
console.log(data.response);
Response fields
GET /health
{
"status": "ok",
"gateway": "ok",
"macMiniReachable": true,
"ollamaResponding": true,
"responseTimeMs": 217
}
GET /models
{
"models": [
{
"name": "qwen3:4b",
"model": "qwen3:4b",
"type": "chat",
"family": "qwen3",
"size": 1234567890
}
]
}
GET /installed-models
{
"models": [
{
"name": "qwen3:4b",
"model": "qwen3:4b",
"size": 1234567890,
"modified_at": "2026-04-16T10:00:00Z",
"details": {
"family": "qwen3",
"parameter_size": "4B",
"quantization_level": "Q4_K_M"
}
}
]
}
POST /chat
{
"model": "qwen3:4b",
"created_at": "2026-04-16T10:00:00Z",
"message": {
"role": "assistant",
"content": "Hello.",
"thinking": "Optional reasoning text"
},
"done": true,
"done_reason": "stop"
}
POST /vision
{
"model": "qwen2.5vl:3b",
"created_at": "2026-04-16T10:00:00Z",
"response": "Image description",
"done": true,
"done_reason": "stop"
}
Errors and limits
401 Unauthorized
Missing or invalid Bearer token.
403 Forbidden
Returned by
/installed-models when the caller is not using the admin token.
400 Invalid request
Returned when required fields are missing, for example no
model, no messages, no prompt, or invalid base64 images.
413 Payload too large
Returned when the request body exceeds the configured maximum. The default limit is
8388608 bytes.
502 or 503 upstream errors
Returned when the Mac Mini or Ollama is unavailable, times out, or returns an unexpected upstream error.