API Documentation

1 Authentication

All API requests require a Bearer token. Get your API key from the Token Management page.

POST https://www.shaibar.com/v1/chat/completions # include Authorization header

# cURL
curl https://www.shaibar.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://www.shaibar.com/v1"   # IMPORTANT: trailing slash
)

chat = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(chat.choices[0].message.content)

⚠ Important The base URL must end with a trailing slash: https://www.shaibar.com/v1/ — without it you'll get a 404.

2 Base URL

https://www.shaibar.com/v1/

Use this as your OpenAI-compatible base URL in any SDK. Works with:

OpenAI Python/JS SDK
LangChain, LlamaIndex
Any OpenAI-compatible client
cURL / HTTP clients

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (main endpoint)
`/v1/completions`	POST	Text completions (legacy)
`/v1/embeddings`	POST	Embeddings
`/v1/models`	GET	List available models
`/v1/models/{model}`	GET	Get model info

3 Available Models

All prices are in USD. Chinese models are routed through direct provider channels — no markup on token costs.

Model ID	Provider	Strengths	Est. Price
`deepseek-chat`	DeepSeek V3	Best value, strong reasoning, fast	~$0.27 / 1M tokens
`deepseek-reasoner`	DeepSeek R1	Chain-of-thought reasoning, math, coding	~$0.55 / 1M tokens
`qwen-plus`	Qwen 2.5 Plus	Balanced, good multilingual	~$0.40 / 1M tokens
`qwen-max`	Qwen 2.5 Max	Highest quality, complex tasks	~$1.20 / 1M tokens
`minimax-text-01`	MiniMax Text-01	Long context, code, multilingual	~$0.35 / 1M tokens
`glm-4-flash`	GLM-4	Fast, low latency	~$0.10 / 1M tokens
`moonshot-v1-128k`	Moonshot V1	128K context window	~$0.60 / 1M tokens
`yi-lightning`	Yi Lightning	Fast, multilingual, creative	~$0.40 / 1M tokens
`bailian-v2`	Bailian (Alibaba)	Open-source compatible, fast	~$0.15 / 1M tokens

Full model list: GET /v1/models

4 Chat Completions

OpenAI-compatible. Request and response formats follow the standard /v1/chat/completions interface.

cURLPythonJavaScript

# cURL example with streaming
curl https://www.shaibar.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "stream": false,
    "max_tokens": 500,
    "temperature": 0.7
  }'

# Python — streaming response
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://www.shaibar.com/v1/"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

// JavaScript / Node.js — OpenAI SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://www.shaibar.com/v1/',
});

const chat = await client.chat.completions.create({
  model: 'qwen-plus',
  messages: [{ role: 'user', content: 'Hi' }],
});
console.log(chat.choices[0].message.content);

Request body parameters (OpenAI format):

Parameter	Type	Required	Description
`model`	string	Yes	Model ID from the available models list
`messages`	array	Yes	Array of {role, content} message objects
`stream`	boolean	No	Enable SSE streaming (default: false)
`max_tokens`	integer	No	Max response tokens (default: 4096)
`temperature`	float	No	Randomness 0–2 (default: 0.7)
`top_p`	float	No	Nucleus sampling (default: 1.0)
`stop`	string/array	No	Stop sequences
`frequency_penalty`	float	No	-2.0 to 2.0 (default: 0)
`presence_penalty`	float	No	-2.0 to 2.0 (default: 0)

5 Billing & Top-Up

Prices are denominated in USD. You pay with USDT (TRON TRC-20, Ethereum ERC-20, or BSC BEP-20). No credit card required.

✓ How it works Deposit USDT → credited as USD balance → deducted per actual token usage. No monthly fees, no subscriptions.

To top up:

Go to shaibar.com/deposit_web3.html
Generate a TRC-20/ERC-20/BEP-20 deposit address
Send USDT to that address — deposits auto-credit within ~1-3 block confirmations
Minimum deposit: 1 USDT

⚠ No refunds All deposits are final. Top up only what you plan to use. USDT deposits are non-refundable.

Balance & usage: Check your balance and usage logs in the Dashboard. Usage is deducted per 1M tokens processed at the rate listed for each model.

6 Chinese API Quirks (Read This!)

These are real differences between Chinese AI APIs and OpenAI that affect how you build your integration:

Topic	What to expect
System prompts	Chinese models (especially DeepSeek, Qwen) handle system prompts well, but keep them concise. Very long system prompts may reduce output quality.
Output length	Default `max_tokens` varies by provider. Set it explicitly. DeepSeek R1 (reasoning) may produce very long responses — increase limit to 8192+ for complex tasks.
Tool use / Function calling	DeepSeek V3 and Qwen support function calling. Test with `stream: false` first. Streaming function calls are complex — disable stream for tool-use heavy apps.
Context window	Most models: 32K–128K context. MiniMax Text-01 supports up to 1M tokens. Sending near-max context is slow and expensive — test with shorter inputs first.
Rate limits	Per-key RPM/TPM limits are enforced. Default limits are generous but not unlimited. If you hit 429, implement exponential backoff. Check `X-RateLimit-*` response headers.
Streaming	SSE streaming works with OpenAI SDK. Some clients (Postman, Insomnia) may not auto-parse SSE correctly — use a real SDK or `curl -N` for testing.
Latency	Expect 1-5s first-token latency for non-streaming. Streaming starts faster. DeepSeek V3 is generally the fastest. Qwen Max is slower but higher quality.
JSON mode	Set `response_format: {"type": "json_object"}` for JSON output. Works on DeepSeek and Qwen. Always include "JSON" in your prompt as well.
Multi-turn conversations	Send the full conversation history each request (standard OpenAI way). Chinese APIs do not maintain server-side sessions — you manage context client-side.
Batch requests	No native batch endpoint. For parallel processing, use async/threading in Python with multiple API calls. Chinese providers handle concurrent requests well.
Cost tracking	Prices are per 1M tokens (input + output counted separately on most models). Monitor usage at shaibar.com/console/log.

API Documentation OpenAI-Compatible

Quick Navigation