Fine-Tuning
Create Job List Jobs

Chat Completions

Given a list of messages comprising a conversation, the model will return a generated response. This endpoint utilizes our proprietary TensorRT-LLM optimized infrastructure for sub-millisecond TTFT (Time to First Token).

POST https://a3gate.in/v1/chat/completions

Request Parameters

Parameter Description
model string Required

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Chat API.

Options: llama-3-70b, mistral-8x7b, custom-ft-id

messages array Required

A list of messages comprising the conversation so far.

Each object requires a role (system, user, assistant) and content.

temperature number

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Defaults to 1.

Response Format

The response is a JSON object containing the model's output along with exact token usage metrics for billing purposes.

cURL Python Node.js
curl https://a3gate.in/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $A3GATE_API_KEY" \ -d '{ "model": "llama-3-70b", "messages": [ { "role": "system", "content": "You are an expert financial analyst." }, { "role": "user", "content": "Summarize the Q3 earnings report." } ] }'
RESPONSE (JSON)
{ "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "llama-3-70b-instruct", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "The Q3 earnings report highlights a 14% increase..." }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 19, "completion_tokens": 120, "total_tokens": 139 } }