Chat Completions

Given a list of messages comprising a conversation, the model will return a generated response. This endpoint utilizes our proprietary TensorRT-LLM optimized infrastructure for sub-millisecond TTFT (Time to First Token).

POST https://a3gate.in/v1/chat/completions

Request Parameters

Parameter Description

Parameter	Description
model string Required	ID of the model to use. See the model endpoint compatibility table for details on which models work with the Chat API. Options: `llama-3-70b`, `mistral-8x7b`, `custom-ft-id`
messages array Required	A list of messages comprising the conversation so far. Each object requires a `role` (system, user, assistant) and `content`.
temperature number	What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 1.

model string Required

ID of the model to use. See the model endpoint compatibility table for details on which models work with the Chat API.

Options: llama-3-70b, mistral-8x7b, custom-ft-id

messages array Required

A list of messages comprising the conversation so far.

Each object requires a role (system, user, assistant) and content.

temperature number

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Defaults to 1.

Response Format

The response is a JSON object containing the model's output along with exact token usage metrics for billing purposes.

Embeddings

Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

POST https://a3gate.in/v1/embeddings

Request Parameters

Parameter	Description
model string Required	ID of the model to use. You can use the List models API to see all of your available models.
input string or array Required	Input text to embed, encoded as a string or array of tokens. To embed multiple inputs in a single request, pass an array of strings or array of token arrays.

Introduction

Welcome to the A3Gate DeepTech API. Our REST API provides programmatic access to our enterprise-grade GPU clusters, allowing you to integrate state-of-the-art language models, embedding pipelines, and fine-tuning workloads directly into your applications.

We provide official SDKs for Python and Node.js, or you can interact directly via REST using cURL or your preferred HTTP client.

Base URL

All API requests must be made over HTTPS. Calls made over plain HTTP will fail. API requests without authentication will also fail.

https://a3gate.in/v1

Authentication

The A3Gate API uses API keys for authentication. Visit your API Keys page in the Dashboard to retrieve the API key you'll use in your requests.

Remember that your API key is a secret! Do not share it with others or expose it in any client-side code (browsers, apps). Production requests must be routed through your own backend server.

Authorization Header

All API requests should include your API key in an Authorization HTTP header as follows:

Authorization: Bearer A3GATE_API_KEY

Error Codes

A3Gate uses conventional HTTP response codes to indicate the success or failure of an API request. In general:

Codes in the 2xx range indicate success.
Codes in the 4xx range indicate an error that failed given the information provided (e.g., a required parameter was omitted).
Codes in the 5xx range indicate an error with A3Gate's servers.

Common Status Codes

Code	Description
400 - Bad Request	The request was unacceptable, often due to missing a required parameter.
401 - Unauthorized	No valid API key provided.
403 - Forbidden	The API key doesn't have permissions to perform the request.
404 - Not Found	The requested resource doesn't exist.
429 - Too Many Requests	Too many requests hit the API too quickly. We recommend an exponential backoff.
500, 502, 503, 504	Server Errors. Something went wrong on A3Gate's end.

Vision Analysis

The Vision API allows our multimodal models to take in images and answer questions about them. This is powered by our custom LLaVA-based architectures running on optimized TensorRT engines.

POST https://a3gate.in/v1/chat/completions

Vision uses the exact same endpoint as Chat Completions, but allows passing an array of content objects containing an image URL or base64 data.

Create Fine-Tuning Job

Creates a fine-tuning job which begins the process of creating a new model from a given dataset. Our distributed compute handles LoRA and full-parameter tuning.

POST https://a3gate.in/v1/fine_tuning/jobs

Parameter	Description
training_filestringRequired	The ID of an uploaded file that contains training data (JSONL format).
modelstringRequired	The name of the base model to fine-tune. You can select "llama-3-8b" or "mistral-7b".
hyperparametersobject	Optional hyperparameters used for fine-tuning. Defaults to automatic.

List Fine-Tuning Jobs

List your organization's fine-tuning jobs, including their status (running, succeeded, failed) and the ID of the resulting fine-tuned model if complete.

GET https://a3gate.in/v1/fine_tuning/jobs

Parameter	Description
afterstring	Identifier for the last job from the previous pagination request.
limitinteger	Number of fine-tuning jobs to retrieve. Defaults to 20.

cURL Python Node.js

curl https://a3gate.in/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $A3GATE_API_KEY" \
  -d '{
    "model": "llama-3-70b",
    "messages": [
      {
        "role": "system",
        "content": "You are an expert financial analyst."
      },
      {
        "role": "user",
        "content": "Summarize the Q3 earnings report."
      }
    ]
  }'
                    

RESPONSE (JSON)

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "llama-3-70b-instruct",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "The Q3 earnings report highlights a 14% increase..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 19,
    "completion_tokens": 120,
    "total_tokens": 139
  }
}