メインコンテンツまでスキップ

LLMサプライヤー

Groqより

模型

LlamaIndex

from llama_index.llms.groq import Groq

llm = Groq(model="mixtral-8x7b-32768", api_key="xxx")
response = llm.complete("Explain the importance of low latency LLMs")
print(response)

速度制限。

ID	REQUESTS PER MINUTE	REQUESTS PER DAY	TOKENS PER MINUTE
llama2-70b-4096 30 14,400 15,000
mixtral-8x7b-32768 30 14,400 9,000
gemma-7b-it 30 14,400 15,000
llama3-70b-8192 30 14,400 7,000
llama3-8b-8192 30 14,400 12,000

Together AIについて

模型和价格

API文档:模型

カテゴリー別分類

  • CHAT(チャット)
  • 言語の定義
  • ……

チャットの種類が必要

MetaLLaMA-3 Chat (8B)meta-llama/Llama-3-8b-chat-hf8000
MetaLLaMA-3 Chat (70B)meta-llama/Llama-3-70b-chat-hf8000
MicrosoftWizardLM-2 (8x22B)microsoft/WizardLM-2-8x22B65536
mistralaiMistral (7B) Instructmistralai/Mistral-7B-Instruct-v0.18192
mistralaiMistral (7B) Instruct v0.2mistralai/Mistral-7B-Instruct-v0.232768
mistralaiMixtral-8x7B Instruct (46.7B)mistralai/Mixtral-8x7B-Instruct-v0.132768
mistralaiMixtral-8x22B Instruct (141B)mistralai/Mixtral-8x22B-Instruct-v0.165536
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"messages": [
{"role": "system", "content": "You are an expert travel guide"},
{"role": "user", "content": "Tell me fun things to do in San Francisco."}
]
}'

速度限制

もっと簡単に

TierRate limit
Free1 QPS
Paid100 QPS

LaIndexの

from llama_index.llms.together import TogetherLLM

# set api key in env or in llm
# import os
# os.environ["TOGETHER_API_KEY"] = "your api key"

llm = TogetherLLM(
model="mistralai/Mixtral-8x7B-Instruct-v0.1", api_key="xxx"
)
resp = llm.complete("Who is Paul Graham?")
print(resp)

embedding

BGEバージョンのみ。