API Documentation OpenAI-Compatible

Use https://www.shaibar.com as your base URL — drop-in replacement for OpenAI, Claude, and other providers.

1 Authentication

All API requests require a Bearer token. Get your API key from the Token Management page.

POST https://www.shaibar.com/v1/chat/completions # include Authorization header
# cURL
curl https://www.shaibar.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
# Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://www.shaibar.com/v1"   # IMPORTANT: trailing slash
)

chat = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(chat.choices[0].message.content)
⚠ Important The base URL must end with a trailing slash: https://www.shaibar.com/v1/ — without it you'll get a 404.

2 Base URL

https://www.shaibar.com/v1/

Use this as your OpenAI-compatible base URL in any SDK. Works with:

  • OpenAI Python/JS SDK
  • LangChain, LlamaIndex
  • Any OpenAI-compatible client
  • cURL / HTTP clients
EndpointMethodDescription
/v1/chat/completionsPOSTChat completions (main endpoint)
/v1/completionsPOSTText completions (legacy)
/v1/embeddingsPOSTEmbeddings
/v1/modelsGETList available models
/v1/models/{model}GETGet model info

3 Available Models

All prices are in USD. Chinese models are routed through direct provider channels — no markup on token costs.

Model IDProviderStrengthsEst. Price
deepseek-chat DeepSeek V3 Best value, strong reasoning, fast ~$0.27 / 1M tokens
deepseek-reasoner DeepSeek R1 Chain-of-thought reasoning, math, coding ~$0.55 / 1M tokens
qwen-plus Qwen 2.5 Plus Balanced, good multilingual ~$0.40 / 1M tokens
qwen-max Qwen 2.5 Max Highest quality, complex tasks ~$1.20 / 1M tokens
minimax-text-01 MiniMax Text-01 Long context, code, multilingual ~$0.35 / 1M tokens
glm-4-flash GLM-4 Fast, low latency ~$0.10 / 1M tokens
moonshot-v1-128k Moonshot V1 128K context window ~$0.60 / 1M tokens
yi-lightning Yi Lightning Fast, multilingual, creative ~$0.40 / 1M tokens
bailian-v2 Bailian (Alibaba) Open-source compatible, fast ~$0.15 / 1M tokens

Full model list: GET /v1/models

4 Chat Completions

OpenAI-compatible. Request and response formats follow the standard /v1/chat/completions interface.

cURLPythonJavaScript

# cURL example with streaming
curl https://www.shaibar.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "stream": false,
    "max_tokens": 500,
    "temperature": 0.7
  }'
# Python — streaming response
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://www.shaibar.com/v1/"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
// JavaScript / Node.js — OpenAI SDK
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://www.shaibar.com/v1/',
});

const chat = await client.chat.completions.create({
  model: 'qwen-plus',
  messages: [{ role: 'user', content: 'Hi' }],
});
console.log(chat.choices[0].message.content);

Request body parameters (OpenAI format):

ParameterTypeRequiredDescription
modelstringYesModel ID from the available models list
messagesarrayYesArray of {role, content} message objects
streambooleanNoEnable SSE streaming (default: false)
max_tokensintegerNoMax response tokens (default: 4096)
temperaturefloatNoRandomness 0–2 (default: 0.7)
top_pfloatNoNucleus sampling (default: 1.0)
stopstring/arrayNoStop sequences
frequency_penaltyfloatNo-2.0 to 2.0 (default: 0)
presence_penaltyfloatNo-2.0 to 2.0 (default: 0)

5 Billing & Top-Up

Prices are denominated in USD. You pay with USDT (TRON TRC-20, Ethereum ERC-20, or BSC BEP-20). No credit card required.

✓ How it works Deposit USDT → credited as USD balance → deducted per actual token usage. No monthly fees, no subscriptions.

To top up:

  • Go to shaibar.com/deposit_web3.html
  • Generate a TRC-20/ERC-20/BEP-20 deposit address
  • Send USDT to that address — deposits auto-credit within ~1-3 block confirmations
  • Minimum deposit: 1 USDT
⚠ No refunds All deposits are final. Top up only what you plan to use. USDT deposits are non-refundable.

Balance & usage: Check your balance and usage logs in the Dashboard. Usage is deducted per 1M tokens processed at the rate listed for each model.

6 Chinese API Quirks (Read This!)

These are real differences between Chinese AI APIs and OpenAI that affect how you build your integration:

TopicWhat to expect
System prompts Chinese models (especially DeepSeek, Qwen) handle system prompts well, but keep them concise. Very long system prompts may reduce output quality.
Output length Default max_tokens varies by provider. Set it explicitly. DeepSeek R1 (reasoning) may produce very long responses — increase limit to 8192+ for complex tasks.
Tool use / Function calling DeepSeek V3 and Qwen support function calling. Test with stream: false first. Streaming function calls are complex — disable stream for tool-use heavy apps.
Context window Most models: 32K–128K context. MiniMax Text-01 supports up to 1M tokens. Sending near-max context is slow and expensive — test with shorter inputs first.
Rate limits Per-key RPM/TPM limits are enforced. Default limits are generous but not unlimited. If you hit 429, implement exponential backoff. Check X-RateLimit-* response headers.
Streaming SSE streaming works with OpenAI SDK. Some clients (Postman, Insomnia) may not auto-parse SSE correctly — use a real SDK or curl -N for testing.
Latency Expect 1-5s first-token latency for non-streaming. Streaming starts faster. DeepSeek V3 is generally the fastest. Qwen Max is slower but higher quality.
JSON mode Set response_format: {"type": "json_object"} for JSON output. Works on DeepSeek and Qwen. Always include "JSON" in your prompt as well.
Multi-turn conversations Send the full conversation history each request (standard OpenAI way). Chinese APIs do not maintain server-side sessions — you manage context client-side.
Batch requests No native batch endpoint. For parallel processing, use async/threading in Python with multiple API calls. Chinese providers handle concurrent requests well.
Cost tracking Prices are per 1M tokens (input + output counted separately on most models). Monitor usage at shaibar.com/console/log.

Questions? Email support@shaibar.com · Dashboard · Top Up USDT