ReRankingを適用したRAGの精度向上について

2024.02.05

はじめに

はじめまして クラスメソッド株式会社 新規事業部のレオナです。

クラスメソッド株式会社では、社内情報の検索と回答の精度向上のために、RAG(Retrieval-Augmented Generation)を用いたQAチャットボットを運用と検証しています。このシステムは、ユーザーからの質問に対して関連する社内文書を検索し、LLMがそれらの情報を基に回答を生成します。しかし、運用の中で、ユーザーが常に必要とする情報を得られないという問題があります。

この問題に対処するため、Re-ranking(再ランク)という手法を用いて問題の解決を試みます。Re-rankingは、Retrieverによって取得された複数の文書を、クエリに対する関連度を別ベクトルを用いて高い順別に並べ替える手法です。これによって検索の精度があがり、チャットボットの回答精度が上がることが期待できます。

実装

今回、参考・引用したサンプルコードになります。

  • https://github.com/openai/openai-cookbook/blob/main/examples/Search_reranking_with_cross-encoders.ipynb
  • https://github.com/aws-samples/amazon-bedrock-rag-workshop/blob/dcdb2f64f796c53a2e226c57447711843e901bca/05_Semantic_Search_with_Reranking/02_LlamaIndex_Reranker_Bedrock_Titan.ipynb

使用するLLMはOpenAIのText-Embedding-Ada-002とAWS BedrockのAmazon.Titan-Embed-Text-v1を使用しています。LLMモデルを使用するにあたって、OpenAIのAPIとAWS Bedrockのモデルの有効化が必要になります。AWS Bedrockの有効化については詳しくはこちらをご覧ください。

Amazon Bedrock をマネジメントコンソールからちょっと触ってみたいときは Base Models(基盤モデル)へのアクセスを設定しましょう

検証では、まず具体的なクエリを定義し、それに関連する文書を用意する必要があります。今回はOpenAIのサンプルコードで書かれている、arxiv(アーカイブ)という査読前論文投稿サイトが提供しているarxiv APIを用いてクエリ検索を行い、論文のAbstractを文書として扱います。

論文情報の取得

1.arxivのライブラリをインポートし、クエリは以下のように定義します。

# pipインストールでarxiv APIが使えるようになります。
pip install arxiv
import arxiv
query = "how do bi-encoders work for sentence embeddings"
client_arxiv = arxiv.Client()
search = arxiv.Search(
query=query, max_results=20, sort_by=arxiv.SortCriterion.Relevance
)

2.クエリに対する検索結果です。比較する対象として、論文のタイトルのみ表示させます。

1: A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
2: SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
3: Are Classes Clusters?
4: Semantic Composition in Visually Grounded Language Models
5: Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
6: Learning Probabilistic Sentence Representations from Paraphrases
7: Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
8: How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
9: Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
10: Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
11: Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
12: SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
13: Learning Joint Representations of Videos and Sentences with Web Image Search
14: Character-based Neural Networks for Sentence Pair Modeling
15: Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
16: Efficient Domain Adaptation of Sentence Embeddings Using Adapters
17: Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
18: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
19: In Search for Linear Relations in Sentence Embedding Spaces
20: Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion

ソート結果

各LLMモデルがクエリとAbstractに対する関連度の高い順番にソートした結果が以下の通りになりました。各LLMによる再ランク付けされ順序が入れ替わりました。

arxiv オリジナル AWS Bedrock Amazon.Titan-Embed-Text-v1 OpenAI Text-Embedding-Ada-002
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Are Classes Clusters?
Are Classes Clusters? In Search for Linear Relations in Sentence Embedding Spaces Semantic Composition in Visually Grounded Language Models
Semantic Composition in Visually Grounded Language Models Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions Are Classes Clusters? How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Learning Probabilistic Sentence Representations from Paraphrases Are Classes Clusters? SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation Vec2Sent: Probing Sentence Embeddings with Natural Language Generation A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Efficient Domain Adaptation of Sentence Embeddings Using Adapters
SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding Semantic Composition in Visually Grounded Language Models Learning Probabilistic Sentence Representations from Paraphrases
Learning Joint Representations of Videos and Sentences with Web Image Search Semantic Composition in Visually Grounded Language Models Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion
Character-based Neural Networks for Sentence Pair Modeling Efficient Domain Adaptation of Sentence Embeddings Using Adapters Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Train Once, Test Anywhere: Zero-Shot Learning for Text Classification Efficient Domain Adaptation of Sentence Embeddings Using Adapters In Search for Linear Relations in Sentence Embedding Spaces
Efficient Domain Adaptation of Sentence Embeddings Using Adapters Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences Character-based Neural Networks for Sentence Pair Modeling
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models Learning Probabilistic Sentence Representations from Paraphrases SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
In Search for Linear Relations in Sentence Embedding Spaces Learning Joint Representations of Videos and Sentences with Web Image Search Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation Learning Joint Representations of Videos and Sentences with Web Image Search

考察

Amazon.Titan-Embed-Text-v1を使用する際、同じタイトルの文書が取得されました。この現象は、「chunking」と呼ばれる細分化して文書内でさらにどこが類似しているのかを検索できます。

llama_indexを使用し、chunkingとchunk_overlapを調整することで、検索する文章が細分化されます。

# BedrockとBedrockEmbeddingをllama_indexからインポートします
from llama_index.llms import Bedrock
from llama_index.embeddings import BedrockEmbedding

# Titanモデルのパラメータを設定します
model_kwargs_titan = {
"stopSequences": [],
"temperature":0.0,
"topP":0.5
}

# Bedrockのインスタンスを作成します
llm = Bedrock(
model="amazon.titan-text-express-v1", # amazon.titan-tg1-largeから変更
context_size=512,
aws_region_name=region,
additional_kwargs=model_kwargs_titan
)

# BedrockEmbeddingのインスタンスを作成します
embed_model = BedrockEmbedding().from_credentials(
aws_profile=None,
model_name='amazon.titan-embed-text-v1' # amazon.titan-embed-g1-text-02から変更
)

# チャンクのオーバーラップを設定します
chunk_overlap = 20
# チャンクのサイズを設定します
chunk_size = 512
# サービスコンテキストを設定します
service_context = ServiceContext.from_defaults(llm=llm,
embed_model=embed_model,
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,)
# グローバルサービスコンテキストを設定します
set_global_service_context(service_context)

比較としてchunkingのサイズを拡大してみました。

Chunking=512 Chunking=2048
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models In Search for Linear Relations in Sentence Embedding Spaces
In Search for Linear Relations in Sentence Embedding Spaces Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Are Classes Clusters?
Are Classes Clusters? Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
Are Classes Clusters? SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features Semantic Composition in Visually Grounded Language Models
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Efficient Domain Adaptation of Sentence Embeddings Using Adapters
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings Learning Probabilistic Sentence Representations from Paraphrases
Semantic Composition in Visually Grounded Language Models Learning Joint Representations of Videos and Sentences with Web Image Search
Semantic Composition in Visually Grounded Language Models How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Efficient Domain Adaptation of Sentence Embeddings Using Adapters Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Efficient Domain Adaptation of Sentence Embeddings Using Adapters Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
Learning Probabilistic Sentence Representations from Paraphrases Character-based Neural Networks for Sentence Pair Modeling
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Learning Joint Representations of Videos and Sentences with Web Image Search Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion

Chunkingのサイズを512から2048に変更したところ参照される文章が増え、詳細に検索されません。一部ランキングが変わりましたが、chunkingサイズを小さくすることでクエリに対して類似性の高い文章が検索できます。

OpenAIのText-Embedding-Ada-002は”Vec2Sent: Probing Sentence Embeddings with Natural Language Generation”が一番類似性がありました。AWSのBedrock Amazon.Titan-Embed-Text-v1は10番目にランク付されていました。

まとめ

検証の結果、異なるLLMを使用してクエリに対する文書の関連度を再ランク付けすることで、検索結果の順位が変わることが確認できました。チャットボットを運用する上で別のモデルに切り替えることで目的に応じた再ランクキングが可能で、検索の精度が上がることが期待できます。

今後に向けて

Re-rankingを用いて検索結果の順位が変わることが確認できましたが、本当に使えるものか検証する必要があります。以下が残課題として挙げられます。

  • 定量的に分析ができていないため評価指数を設定して、それを用いて分析する。
  • Reranking前と後でユーザーが必要な情報を得られたか、定性的に分析する。