跳到主要内容

LlamaIndex-第2篇(QA和评估)

生产级的范例

SEC-Insights

QA

User Case:

What

  • 语义查询(Semantic search / Top K)
  • 总结

Where

How

上面的链接都是指向:下面的Q&A patterns

Understanding: Q&A patterns

一个最简单的Q&A

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

选择不同的数据源(Route Datasource)

链接

Compare/Contrast Queries

这个不懂

Multi Document Queries

Besides the explicit synthesis/routing flows described above, LlamaIndex can support more general multi-document queries as well. It can do this through our SubQuestionQueryEngine class. Given a query, this query engine will generate a "query plan" containing sub-queries against sub-documents before synthesizing the final answer.

This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer. This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.

Multi-Step Queries

LlamaIndex can also support iterative multi-step queries. Given a complex query, break it down into an initial subquestions, and sequentially generate subquestions based on returned answers until the final answer is returned.

For instance, given a question "Who was in the first batch of the accelerator program the author started?", the module will first decompose the query into a simpler initial question "What was the accelerator program the author started?", query the index, and then ask followup questions.

时态查询

Eval

概念入门

  • 评估响应
  • 评估检索

详解概述和流程

  • 评估响应
    • 使用GPT-4来评估
    • 评估的维度
      • 生成的答案与参考答案:正确性和语义相似度
      • 生成的答案与retrieved contexts:Faithfulness
      • 生成的答案与Query: Answer Relevancy
      • retrieved contexts和Query:Context Relevancy
    • 生成参考答案
  • 评估检索(retrieval)
    • 如何评估:ranking metrics like mean-reciprocal rank (MRR), hit-rate, precision, and more.

生成dataset

使用范例

集成到其它工具

  • UpTrain: 1.9K:可试用,但是需要book demo,目测不便宜
  • Tonic Validate(Includes Web UI for visualizing results):有商业版本,可试用,之后200美元/月
  • DeepEval: 1.6K
  • Ragas: 4.4K
    • 感觉很不错
    • Llamaindex-->Ragas-->LangSmith和其它工具
    • 但是,很搓,quick start运行失败,一起提示ModuleNotFoundError: No module named 'ragas.metrics'; 'ragas' is not a package

费用评估

优化

基础优化

Retrieval