LlamaIndex-Chapter 2 (QA and Assessment)
Production-level examples
QA
User Case:
- Q&A: Very good
- Structured Data Extraction
What
- Semantic Query (Semantic search/ Top K)
- sum up
Where
- Over documents
- Building a multi-document agent over the LlamaIndex docs
- Over structured Data (such as JSON)
- Searching Pandas tables
- Text to SQL
How
The links above all point to the sense: Q&A patterns below
One of the simplest Q&A
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
Select different Data Sources (Route Datasource)
Compare/Contrast Queries
I don't understand this
Besides the explicit synthesis/routing flows described above, LlamaIndex can support more general multi-document queries as well. It can do this through our SubQuestionQueryEngine
class. Given a query, this query engine will generate a "query plan" containing sub-queries against sub-documents before synthesizing the final answer.
This query engine can execute any number of sub-queries against any subset of query engine tools before synthesizing the final answer. This makes it especially well-suited for compare/contrast queries across documents as well as queries pertaining to a specific document.
LlamaIndex can also support iterative multi-step queries. Given a complex query, break it down into an initial subquestions, and sequentially generate subquestions based on returned answers until the final answer is returned.
For instance, given a question "Who was in the first batch of the accelerator program the author started? ", the module will first decompose the query into a simpler initial question "What was the accelerator program the author started? ", query the index, and then ask followup questions.
Eval
- evaluation response
- evaluation search
- evaluation response
- Use GPT-4 to evaluate
- Dimension of Assessment
- 生成的答案与参考答案:正确性和语义相似度
- 生成的答案与retrieved contexts:Faithfulness
- 生成的答案与Query: Answer Relevancy
- retrieved contexts和Query:Context Relevancy
- Generate reference answers
- Evaluation search (retrieval)
- How to evaluate: ranking metrics like mean-reciprocal rank (MRR), hit-rate, precision, and more.
use case
Ensemble with other tools
- UpTrain: 1.9K: Available for trial, but a book demo is required, and eye observation is not cheap
- Tonic Validate(Includes Web UI for visualizing results): There is a commercial version, which can be tried, and then US$200/month
- DeepEval: 1.6K
- Ragas: 4.4K
- feels good
- Llamaindex-->Ragas-->LangSmith and other tools
- However, unfortunately, the quick start failed to run, prompting
ModuleNotFoundError: No module named 'ragas.metrics'; 'ragas' is not a package