LLMサプライヤー

Groqより

from llama_index.llms.groq import Groq

llm = Groq(model="mixtral-8x7b-32768", api_key="xxx")
response = llm.complete("Explain the importance of low latency LLMs")
print(response)

速度制限。

ID	REQUESTS PER MINUTE	REQUESTS PER DAY	TOKENS PER MINUTE
llama2-70b-4096	30	14,400	15,000
mixtral-8x7b-32768	30	14,400	9,000
gemma-7b-it	30	14,400	15,000
llama3-70b-8192	30	14,400	7,000
llama3-8b-8192	30	14,400	12,000

Together AIについて

模型和价格

API文档：模型

カテゴリー別分類

CHAT（チャット）
言語の定義
……

チャットの種類が必要


Meta	LLaMA-3 Chat (8B)	meta-llama/Llama-3-8b-chat-hf	8000
Meta	LLaMA-3 Chat (70B)	meta-llama/Llama-3-70b-chat-hf	8000
Microsoft	WizardLM-2 (8x22B)	microsoft/WizardLM-2-8x22B	65536
mistralai	Mistral (7B) Instruct	mistralai/Mistral-7B-Instruct-v0.1	8192
mistralai	Mistral (7B) Instruct v0.2	mistralai/Mistral-7B-Instruct-v0.2	32768
mistralai	Mixtral-8x7B Instruct (46.7B)	mistralai/Mixtral-8x7B-Instruct-v0.1	32768
mistralai	Mixtral-8x22B Instruct (141B)	mistralai/Mixtral-8x22B-Instruct-v0.1	65536

curl -X POST "https://api.together.xyz/v1/chat/completions" \
     -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
     "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
     "messages": [
     		{"role": "system", "content": "You are an expert travel guide"},
     		{"role": "user", "content": "Tell me fun things to do in San Francisco."}
     	]
     }'

速度限制

もっと簡単に

Tier	Rate limit
Free	1 QPS
Paid	100 QPS

LaIndexの

from llama_index.llms.together import TogetherLLM

# set api key in env or in llm
# import os
# os.environ["TOGETHER_API_KEY"] = "your api key"

llm = TogetherLLM(
    model="mistralai/Mixtral-8x7B-Instruct-v0.1", api_key="xxx"
)
resp = llm.complete("Who is Paul Graham?")
print(resp)

embedding

BGEバージョンのみ。