LlamaIndex-Chapter 2 (QA and Assessment)

Production-level examples

SEC-Insights

QA

User Case:

Q&A: Very good
Structured Data Extraction

What

Semantic Query (Semantic search/ Top K)
sum up

Where

Over documents
Building a multi-document agent over the LlamaIndex docs
Over structured Data (such as JSON)
Searching Pandas tables
Text to SQL

How

The links above all point to the sense: Q&A patterns below

Understanding: Q&A patterns

One of the simplest Q&A

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

Select different Data Sources (Route Datasource)

link

Compare/Contrast Queries

I don't understand this

Multi Document Queries

Besides the explicit synthesis/routing flows described above, LlamaIndex can support more general multi-document queries as well. It can do this through our SubQuestionQueryEngine class. Given a query, this query engine will generate a "query plan" containing sub-queries against sub-documents before synthesizing the final answer.

This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer. This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.

Multi-Step Queries

LlamaIndex can also support iterative multi-step queries. Given a complex query, break it down into an initial subquestions, and sequentially generate subquestions based on returned answers until the final answer is returned.

For instance, given a question "Who was in the first batch of the accelerator program the author started? ", the module will first decompose the query into a simpler initial question "What was the accelerator program the author started? ", query the index, and then ask followup questions.

Temporal Query

Eval

Introduction to concepts

evaluation response
evaluation search

Detailed overview and process

evaluation response
- Use GPT-4 to evaluate
- Dimension of Assessment
  - 生成的答案与参考答案：正确性和语义相似度
  - 生成的答案与retrieved contexts：Faithfulness
  - 生成的答案与Query: Answer Relevancy
  - retrieved contexts和Query：Context Relevancy
- Generate reference answers
Evaluation search (retrieval)
- How to evaluate: ranking metrics like mean-reciprocal rank (MRR), hit-rate, precision, and more.

Generate dataset

use case

Ensemble with other tools

UpTrain: 1.9K: Available for trial, but a book demo is required, and eye observation is not cheap
Tonic Validate(Includes Web UI for visualizing results): There is a commercial version, which can be tried, and then US$200/month
- llamaindex has a detailed article
DeepEval: 1.6K
Ragas: 4.4K
- feels good
- Llamaindex-->Ragas-->LangSmith and other tools
- However, unfortunately, the quick start failed to run, promptingModuleNotFoundError: No module named 'ragas.metrics'; 'ragas' is not a package

cost assessment

optimize

Basic optimization

Retrieval