Beta Documentation

This documentation is for the Synaptiq API beta. Features and APIs may change without notice. Please provide feedback to help us improve.

Getting Started

Consumption and Rate Limits

Each synaptiq model has different rate limits. To check your team's rate limits, you can visit Synaptiq Console Models Page.

Basic unit to calculate consumption — Tokens

Token is the base unit of prompt size for model inference and pricing purposes. It is consisted of one or more character(s)/symbol(s).

When a Synaptiq model handles your request, an input prompt will be decomposed into a list of tokens through a tokenizer. The model will then make inference based on the prompt tokens, and generate completion tokens. After the inference is completed, the completion tokens will be aggregated into a completion response sent back to you.

You can use Tokenizer on Synaptiq Console to visualize tokens and count total token counts for a given text prompt.

Tokenizer interface

Rate Limits

Rate limits are enforced to ensure fair usage of the API and to prevent abuse. Rate limits are applied at the team level, not at the individual user level.

ModelRPM (Requests per Minute)TPM (Tokens per Minute)
synaptiq-260100,000
synaptiq-2-mini120150,000
synaptiq-2-quantum3080,000
synaptiq-2-math4090,000

Need higher rate limits?

If you need higher rate limits for your application, please contact our sales team to discuss enterprise options.

Handling Rate Limits

When you exceed your rate limits, the API will return a 429 Too Many Requests error. Your application should be designed to handle these errors gracefully, typically by implementing exponential backoff retry logic.

{
  "error": {
    "message": "Rate limit exceeded: 60 requests per minute. Please try again later.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Recommended Retry Strategy

Here's a recommended approach for handling rate limits:

import time
import random
from openai import OpenAI
from openai.error import RateLimitError

client = OpenAI(
    api_key=SYNAPTIQ_API_KEY,
    base_url="https://api.synaptiq.contact/v1",
)

def make_request_with_retry(max_retries=5):
    retries = 0
    while retries < max_retries:
        try:
            response = client.chat.completions.create(
                model="synaptiq-2",
                messages=[
                    {"role": "user", "content": "Explain quantum entanglement"}
                ]
            )
            return response
        except RateLimitError as e:
            retries += 1
            if retries >= max_retries:
                raise e
            
            # Exponential backoff with jitter
            sleep_time = (2 ** retries) + random.random()
            print(f"Rate limit exceeded. Retrying in {sleep_time:.2f} seconds...")
            time.sleep(sleep_time)