ARI Models Gateway

The public client endpoint is https://models.ari-motors.com. The gateway forwards requests to a private Ollama instance on the Mac Mini through Tailscale. Clients never talk to Ollama directly.

Base URL

Public API

https://models.ari-motors.com

Playground

/

Documentation

/playground/docs

Authentication

Public API calls require a Bearer token in the Authorization header.

Authorization: Bearer <prod-token>

The admin-only endpoint /installed-models requires the admin token.

Authorization: Bearer <admin-token>

The browser playground does not expose tokens. It uses Auth0 session login on the server and internal proxy routes under /playground-api/*.

Endpoints

Method	Path	Purpose	Auth
GET	/health	Checks gateway health and whether Ollama on the Mac Mini is reachable.	Prod token
GET	/models	Returns the current public model list in a simplified format.	Prod token
GET	/installed-models	Returns the installed model inventory from Ollama with extended metadata.	Admin token
POST	/chat	Runs chat-style text generation with one of the available text models.	Prod token
POST	/vision	Runs multimodal generation with a prompt and one or more base64 images.	Prod token

What the API can return

Health state You can retrieve gateway status, upstream reachability, and response time in milliseconds.

Public model inventory You can retrieve each public model name, normalized model identifier, inferred type, family, and size.

Installed model inventory Admin users can retrieve model name, model id, byte size, modification timestamp, family, parameter size, and quantization level.

Chat output Non-streaming chat returns Ollama JSON directly, including fields such as model, created_at, message, done, done_reason, and timing counters when present.

Streaming chunks Streaming is passed through without interpretation, so chunks may include message.content, message.thinking, response, done, and done_reason depending on the endpoint and model.

Vision output Non-streaming vision returns the upstream Ollama generate response. When streaming is enabled, newline-delimited JSON chunks are forwarded as they arrive.

Request formats

Chat request body

{
  "model": "qwen3:4b",
  "messages": [
    { "role": "user", "content": "Write one short product summary." }
  ],
  "stream": false,
  "keep_alive": "5m",
  "options": {},
  "format": "json"
}

Vision request body

{
  "model": "qwen2.5vl:3b",
  "prompt": "Describe the image in English.",
  "images": ["<base64-image>"],
  "stream": false,
  "keep_alive": 0,
  "options": {}
}

Vision images are always preprocessed on the gateway: max long side 768 px, converted to JPEG, quality 82 by default.

JavaScript examples

1. Check health

const response = await fetch('https://models.ari-motors.com/health', {
  headers: {
    Authorization: 'Bearer ' + PROD_TOKEN
  }
});

const data = await response.json();
console.log(data);

2. Load public models

const response = await fetch('https://models.ari-motors.com/models', {
  headers: {
    Authorization: 'Bearer ' + PROD_TOKEN
  }
});

const data = await response.json();
console.log(data.models);

3. Send a chat request

const response = await fetch('https://models.ari-motors.com/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: 'Bearer ' + PROD_TOKEN
  },
  body: JSON.stringify({
    model: 'qwen3:4b',
    messages: [
      { role: 'user', content: 'Write a short English greeting.' }
    ],
    stream: false,
    keep_alive: '5m'
  })
});

const data = await response.json();
console.log(data.message?.content);
console.log(data.message?.thinking);

4. Consume streaming chat output

const response = await fetch('https://models.ari-motors.com/chat', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: 'Bearer ' + PROD_TOKEN
  },
  body: JSON.stringify({
    model: 'qwen3:4b',
    messages: [
      { role: 'user', content: 'Explain HTTP streaming in one paragraph.' }
    ],
    stream: true
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { value, done } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() || '';

  for (const line of lines) {
    if (!line.trim()) continue;
    const chunk = JSON.parse(line);

    if (chunk.message?.thinking) {
      console.log('thinking:', chunk.message.thinking);
    }

    if (chunk.message?.content) {
      console.log('content:', chunk.message.content);
    }

    if (chunk.done) {
      console.log('finished:', chunk.done_reason);
    }
  }
}

5. Send a vision request from JavaScript

async function fileToBase64(file) {
  return new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => {
      const result = String(reader.result || '');
      resolve(result.includes(',') ? result.split(',')[1] : result);
    };
    reader.onerror = reject;
    reader.readAsDataURL(file);
  });
}

const imageBase64 = await fileToBase64(fileInput.files[0]);

const response = await fetch('https://models.ari-motors.com/vision', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    Authorization: 'Bearer ' + PROD_TOKEN
  },
  body: JSON.stringify({
    model: 'qwen2.5vl:3b',
    prompt: 'Describe everything visible in this image.',
    images: [imageBase64],
    stream: false
  })
});

const data = await response.json();
console.log(data.response);

Response fields

GET /health

{
  "status": "ok",
  "gateway": "ok",
  "macMiniReachable": true,
  "ollamaResponding": true,
  "responseTimeMs": 217
}

GET /models

{
  "models": [
    {
      "name": "qwen3:4b",
      "model": "qwen3:4b",
      "type": "chat",
      "family": "qwen3",
      "size": 1234567890
    }
  ]
}

GET /installed-models

{
  "models": [
    {
      "name": "qwen3:4b",
      "model": "qwen3:4b",
      "size": 1234567890,
      "modified_at": "2026-04-16T10:00:00Z",
      "details": {
        "family": "qwen3",
        "parameter_size": "4B",
        "quantization_level": "Q4_K_M"
      }
    }
  ]
}

POST /chat

{
  "model": "qwen3:4b",
  "created_at": "2026-04-16T10:00:00Z",
  "message": {
    "role": "assistant",
    "content": "Hello.",
    "thinking": "Optional reasoning text"
  },
  "done": true,
  "done_reason": "stop"
}

POST /vision

{
  "model": "qwen2.5vl:3b",
  "created_at": "2026-04-16T10:00:00Z",
  "response": "Image description",
  "done": true,
  "done_reason": "stop"
}

Errors and limits

401 Unauthorized Missing or invalid Bearer token.

403 Forbidden Returned by /installed-models when the caller is not using the admin token.

400 Invalid request Returned when required fields are missing, for example no model, no messages, no prompt, or invalid base64 images.

413 Payload too large Returned when the request body exceeds the configured maximum. The default limit is 8388608 bytes.

502 or 503 upstream errors Returned when the Mac Mini or Ollama is unavailable, times out, or returns an unexpected upstream error.