LlamaIndexを完全に理解するチュートリアルその１：処理の概念や流れを理解する基礎編（v0.6.8対応）

LlamaIndexブラックボックス化していませんか？きちんと理解してカスタマイズの幅を広げましょう！

#LlamaIndex

#ChatGPT

#OpenAI

#機械学習

#Python

nokomoro3

2023.05.16

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

こんちには。

データアナリティクス事業本部インテグレーション部機械学習チームの中村です。

本記事から数回に分けて、LlamaIndexの中身を深堀していく記事を書いていこうと思います。

本記事の内容

初回は基礎ということで、サンプルコードを例にLlamaIndexで登場する処理を深堀して、カスタマイズのヒントになる知識を得ていきます。

基礎編とはなっていますが、結構長めの記事となっています。

基礎が最も大事ということで、LlamaIndexをただ単に使用するだけではブラックボックス化してしまいがちな概念や処理の流れを一通り洗い出しています。

これ以降の記事では、この基礎をベースにしてどのようなユースケースとカスタマイズが考えられるのかを記事にしていこうと思いますが、初回から重たいので完全理解するよりは、今後の拠り所とする感じにして頂けますと幸いです。

LlamaIndexとは

LlamaIndexは大規模言語モデル(LLM)と外部データ(あなた自身のデータ)を接続するためのインターフェースを提供するプロジェクトとなっています。

LLMのカスタマイズするためのパラダイムには主に２種類があります。

LLMをFine-tuningする
入力プロンプトにコンテキストを埋め込む

LlamdaIndexは後者を、より性能よく、効率よく、安価に行うために様々なデータ取り込みやインデックス化を実施することが可能です。

公式ドキュメントは以下となっています。（日々更新されているので、現在の最新のv0.6.8を参照しておきます）

環境準備

実行環境

Google Colaboratoryを使います。ハードウェアアクセラレータは無し、ラインタイム仕様も標準です。

ローカル環境で実行されても良いと思います。

Pythonのバージョンは以下です。

!python --version

Python 3.10.11

また、llama-indexは以下のようにインストールします。

!pip install llama-index

インストールされた主要なライブラリは以下です。

!pip freeze | grep -e "openai" -e "llama-index" -e "langchain"

langchain==0.0.170
llama-index==0.6.8
openai==0.27.6

このやり方にこだわる必要はないのですが、環境変数をpython-dotenvを使って読み込みます。

!pip install python-dotenv

OPEN_AI_KEYを.envに書き込みます。

!echo 'OPENAI_API_KEY="あなたのOPENAIのAPIキー"' >.env

この前提として、OPENAI_API_KEYは以下などに従って準備する必要があります。（API利用する分で従量課金となりますのでご注意ください）

環境変数をロードします。

from dotenv import load_dotenv
load_dotenv()

データの準備

適当なテキストデータを準備しておきます。今回は3,000文字程度のテキスト２つにしておきました。

./data/
  sample_001.txt
  sample_002.txt

動かしてみる

GPTListIndexを使ったサンプル

最も基本的なサンプルとして、GPTListIndexを使ったコードを準備しました。

from llama_index import SimpleDirectoryReader
from llama_index import Document
from llama_index import GPTListIndex

documents = SimpleDirectoryReader(input_dir="./data").load_data()

list_index = GPTListIndex.from_documents(documents)

query_engine = list_index.as_query_engine()

response = query_engine.query("機械学習に関するアップデートについて300字前後で要約してください。")

for i in response.response.split("。"):
    print(i + "。")

おおむね以下の流れで処理されます。

Readerの処理
- データソース（この場合はローカルフォルダ）をList[Document]に変換
Indexの作成
- List[Document]からインデックス（この場合GPTListIndex）を作成
QueryEngineの作成
- インデックスのas_query_engineでQueryEngineを作成
- 0.6以降はAPIが新しくなったため、as_query_engineを使用する必要があります

Readerについて

Readerはデータソースに応じて多数のものがLlamaIndexライブラリ内に準備されています。

また、LlamaHubという形では更に多くのが提供されているようです。

サンプルでは、以下のようにローカルのテキストファイルを読み込むSimpleDirectoryReaderを使用しています。

documents = SimpleDirectoryReader(input_dir="./data").load_data()

Indexについて

Indexの仕組みは既に多くの記事で紹介がありますが、公式には以下に記載があります。

このページでは、４つのインデックス構造が紹介されています。

List Index（GPTListIndex）
- 単にNodeのリストを保持
- クエリ時は先頭から順次処理し、それぞれの出力を合成
Vector Store Index（GPTVectorStoreIndex）
- 各Nodeに対応する埋め込みベクトルと共に順序付けせずに保持
- 埋め込みベクトルを使用してNodeを抽出し、それぞれの出力を合成
Tree Index（GPTTreeIndex）
- ノードをツリー構造にして保持
- クエリ時はRootから探索して、使用するノードを決め、その出力を合成
Table Index（GPTKeywordTableIndex、GPTRAKEKeywordTableIndex、GPTSimpleKeywordTableIndex）
- 各Nodeからキーワードを抽出し、キーワードに対するNodeをマッピングして保持
- クエリ時はクエリのキーワードを使ってNodeを選択し、それぞれのノードの出力を合成
- ３種類のIndexがあり、それぞれノードのキーワード抽出方法が異なる（Retrieverも同様な３種類があるが、独立に設定可能）

Retrieverについて

Indexはその種類によってはRetrieverModeを選択することで、Nodeの抽出方法を変えることができます。

Index種類	Retriever Mode	説明
List Index	ListRetrieverMode.DEFAULT	すべてのノードを抽出
List Index	ListRetrieverMode.EMBEDDING	埋め込みベクトルを使って抽出
Vector Store Index	一意	埋め込みベクトルを使って抽出
Tree Index	TreeRetrieverMode.SELECT_LEAF	プロンプトを使ってLeafノードを探索して抽出
Tree Index	TreeRetrieverMode.SELECT_LEAF_EMBEDDING	埋め込みベクトルを使ってLeafノードを探索して抽出
Tree Index	TreeRetrieverMode.ALL_LEAF	全てのLeafノードを使いクエリ固有のツリーを構築して応答
Tree Index	TreeRetrieverMode.ROOT	ルートノードのみを使って応答
Table Index	KeywordTableRetrieverMode.DEFAULT	GPTを使ってクエリのキーワード抽出を行う
Table Index	KeywordTableRetrieverMode.SIMPLE	正規表現を使ってクエリのキーワード抽出を行う
Table Index	KeywordTableRetrieverMode.RAKE	RAKEキーワード抽出器を使ってクエリのキーワード抽出を行う

なおRetrieverModeは、後述するas_query_engineでQueryEngineを作成する際に与えることができます。

Contextについて

IndexとRetrieverは密接に関連しているものですが、それとは別に依存する処理クラスをContextとして与えます。

このContextは具体的には、Storage ContextとService Contextの２種類です。

冒頭のサンプルでは、デフォルトで動作しているためContextが見えないのですが、明示的に書くと以下のようになります。

from llama_index import StorageContext
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.vector_stores import SimpleVectorStore

# Storage Contextの作成
storage_context = StorageContext.from_defaults(
    docstore=SimpleDocumentStore()
    , vector_store=SimpleVectorStore()
    , index_store=SimpleIndexStore()
)

from llama_index import ServiceContext
from llama_index.node_parser import SimpleNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index import LLMPredictor
from llama_index.indices.prompt_helper import PromptHelper
from llama_index.logger.base import LlamaLogger
from llama_index.callbacks.base import CallbackManager

# Service Contextの作成
llm_predictor = LLMPredictor()
service_context = ServiceContext.from_defaults(
    node_parser=SimpleNodeParser()
    , embed_model=OpenAIEmbedding()
    , llm_predictor=llm_predictor
    , prompt_helper=PromptHelper.from_llm_predictor(llm_predictor)
    , llama_logger=LlamaLogger()
    , callback_manager=CallbackManager([])
)

# Index作成時にContextを入れる
list_index = GPTListIndex.from_documents(
    documents
    , storage_context=storage_context
    , service_context=service_context
)

# これ以降は同じ
query_engine = list_index.as_query_engine()

response = query_engine.query("機械学習に関するアップデートについて300字前後で要約してください。")

for i in response.response.split("。"):
    print(i + "。")

このようにIndexは、StorageContextとServiceContextという処理クラスに依存しています。

Indexとは独立してこれらのContextをカスタマイズできます。

Storage Contextについて

Storage Contextは３つのストアで構成されます。

Vector Store
Document Store
Index Store

Storage Context全体は、以下のようにJSONファイルにダンプすることが可能です。

import json

with open("storage_context.json", "wt") as f:
    json.dump(list_index.storage_context.to_dict(), f, indent=4)

構造は以下のようになっています。

{
    "vector_store": {
        // ...
    },
    "doc_store": {
        // ...
    },
    "index_store": {
    }
}

Vector Storeについて

Vector Storeについて見ていきましょう。

Vector Storeはベクトルデータが格納されるストアで、以下でJSONにダンプできます。

with open("vector_store.json", "wt") as f:
    json.dump(list_index.storage_context.vector_store.to_dict(), f, indent=4)

GPTListIndexはデフォルトではvector_storeを使わないため、空欄となっています。

{
    "embedding_dict": {},
    "text_id_to_doc_id": {}
}

今回のサンプルのようなデフォルトの場合、Vector StoreはSimpleVectoreStoreとなっています。

list_index.storage_context.vector_store

<llama_index.vector_stores.simple.SimpleVectorStore at 0x7f269b12b130>

このSimpleVectorStoreは、InMemoryなストアとなっています。

Vector Storeは、その他多数のサードパーティーのストアに対応しており、以下で一覧の確認が可能です。

Document Storeについて

Document Storeについても見ていきましょう。

Document Storeはテキストデータが格納されるストアで、以下でJSONにダンプできます。

with open("docstore.json", "wt") as f:
    json.dump(list_index.storage_context.docstore.to_dict(), f, indent=4)

{
    "docstore/metadata": {
        // ...
    },
    "docstore/data": {
        // ...
    }
}

Document Storeはテキストデータが格納されています。docstore/metadataとdocstore/dataから構成されます。

docstore/metadataには、doc_idとそれぞれのdoc_idに対するdoc_hashが含まれています。

doc_idは以下の単位で割り当てられています。

ソースとなっているDocumentオブジェクトそれぞれ（今回の場合、２つのテキストファイル）
上記を分割したノードそれぞれ

{
    "docstore/metadata": {
        // sample_001.txt
        "7070db93-05b4-452d-8d75-a12f98fd7f97": {
            "doc_hash": "827a50249e2443896189e334a5a8ca9d3bd161f5ed7c669f46f8674531132d75"
        },
        // sample_002.txt
        "3dbfae09-2fc4-48ec-95e7-3ff2aaeaffde": {
            "doc_hash": "fd69db6ca1ae8f867e9941159dd53f8e7930fa0ddecff7daa009918dfcb43cec"
        },
        // sample_001.txtの分割ノードその１
        "5d5d2978-f78e-4098-b0d2-78e131c46f40": {
            "doc_hash": "ef5d86fdbac91506fc425138d49777183064b6d2871232189483324dafee0479"
        },
        // sample_001.txtの分割ノードその２
        "2f00f1f7-464b-437f-a5b6-bd3cb1e92cef": {
            "doc_hash": "c77a1b8cc862bdadf5b7da48d0faefccaff1008c84ba234d739db84fdd518706"
        },

        // ... 中略 ...
    }
}

docstore/dataは各ノードとなっており、テキストの中身の情報やその他の情報が含まれます。

{
    "docstore/data": {

        // ... 中略 ...

        "2f00f1f7-464b-437f-a5b6-bd3cb1e92cef": {
            "__data__": {
                "text": "<テキスト本文>",
                "doc_id": "2f00f1f7-464b-437f-a5b6-bd3cb1e92cef",
                "embedding": null,
                "doc_hash": "c77a1b8cc862bdadf5b7da48d0faefccaff1008c84ba234d739db84fdd518706",
                "extra_info": null,
                "node_info": {
                    "start": 744,
                    "end": 1651
                },
                "relationships": {
                    "1": "7070db93-05b4-452d-8d75-a12f98fd7f97",
                    "2": "5d5d2978-f78e-4098-b0d2-78e131c46f40",
                    "3": "9a2caaab-ec8c-450a-9fdd-8f2e3487d044"
                }
            },
            "__type__": "1"
        },

        // ... 中略 ...
    }
}

extra_infoには様々なkey-valueを含むことができ、元ファイル名などの情報も格納できます。

node_infoには、元のファイルのどこからどこまでを分割したものかの情報があります。

relationshipsは関連のあるノードとの関係性を表しており、元ファイルとなっている文書のdoc_idや、テキストの分割前の前後関係にあるノードのdoc_idなどの情報が格納されています。

以下のDocumentRelationshipオブジェクトに基づいています。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/data_structs/node.py#L24

class DocumentRelationship(str, Enum):
    """Document relationships used in Node class.

    Attributes:
        SOURCE: The node is the source document.
        PREVIOUS: The node is the previous node in the document.
        NEXT: The node is the next node in the document.
        PARENT: The node is the parent node in the document.
        CHILD: The node is a child node in the document.

    """

    SOURCE = auto()
    PREVIOUS = auto()
    NEXT = auto()
    PARENT = auto()
    CHILD = auto()

typeはノードのタイプを表しており、通常はTEXTですが、以下のNodeTypeオブジェクトに基づいています。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/data_structs/node.py#L43

class NodeType(str, Enum):
    TEXT = auto()
    IMAGE = auto()
    INDEX = auto()

今回のサンプルのようなデフォルトの場合、Document StoreはSimpleDocumentStoreとなっています。

list_index.storage_context.docstore

<llama_index.storage.docstore.simple_docstore.SimpleDocumentStore at 0x7f076a1f3f40>

このSimpleDocumentStoreは、InMemoryなストアとなっています。

Document Storeは、その他様々なストアに対応しており、以下で一覧の確認が可能です。

Index Store

Index Storeについても見ていきましょう。

Index Storeはインデックスに関する情報が格納されるストアで、以下でJSONにダンプできます。

with open("index_store.json", "wt") as f:
    json.dump(list_index.storage_context.index_store.to_dict(), f, indent=4)

{
    "index_store/data": {
        "945952dc-00ae-4640-b7fd-ee8bad1f780a": {
            "__type__": "list",
            "__data__": {
                "index_id": "945952dc-00ae-4640-b7fd-ee8bad1f780a",
                "summary": null,
                "nodes": [
                    "5d5d2978-f78e-4098-b0d2-78e131c46f40",
                    "2f00f1f7-464b-437f-a5b6-bd3cb1e92cef",
                    "9a2caaab-ec8c-450a-9fdd-8f2e3487d044",
                    "84641f24-dbca-4959-832e-510200348bd1",
                    "c8f0a665-0de1-4830-b89e-0aca17798520",
                    "d6984263-93b0-40b3-ba75-88214cc67d02",
                    "05349902-840f-42e0-9e35-85bc453d956e",
                    "a8f570e2-bd9b-43f4-b113-e06eb4602e77",
                    "b23ba613-2d51-4ffc-b91e-433b001a6f7c"
                ]
            }
        }
    }
}

GPTListIndexなので、各ノードがlistで格納されていることが分かります。

今回のサンプルのようなデフォルトの場合、Index StoreはSimpleIndexStoreとなっています。

list_index.storage_context.index_store

<llama_index.storage.index_store.simple_index_store.SimpleIndexStore at 0x7f269b12b2b0>

このSimpleIndexStoreは、InMemoryなストアとなっています。

Index Storeは、その他様々なストアに対応しており、以下で一覧の確認が可能です。

Document Storeの詳細

ここで、Document Storeの各ノードの情報を見ていきます。

以下のコードで、各ノードの長さや元テキストファイルに対する始点、終点を見ることができます。

for doc_id, node in list_index.storage_context.docstore.docs.items():
    node_dict = node.to_dict()
    print(f'{doc_id=}, len={len(node_dict["text"])}, start={node_dict["node_info"]["start"]}, end={node_dict["node_info"]["end"]}')

doc_id='5d5d2978-f78e-4098-b0d2-78e131c46f40', len=903, start=0, end=903
doc_id='2f00f1f7-464b-437f-a5b6-bd3cb1e92cef', len=907, start=744, end=1651
doc_id='9a2caaab-ec8c-450a-9fdd-8f2e3487d044', len=869, start=1698, end=2567
doc_id='84641f24-dbca-4959-832e-510200348bd1', len=740, start=2532, end=3272
doc_id='c8f0a665-0de1-4830-b89e-0aca17798520', len=735, start=0, end=735
doc_id='d6984263-93b0-40b3-ba75-88214cc67d02', len=792, start=621, end=1413
doc_id='05349902-840f-42e0-9e35-85bc453d956e', len=834, start=1374, end=2208
doc_id='a8f570e2-bd9b-43f4-b113-e06eb4602e77', len=824, start=2208, end=3032
doc_id='b23ba613-2d51-4ffc-b91e-433b001a6f7c', len=361, start=3069, end=3430

ある程度のサイズで分割されているものの、一定ではなくオーバーラップもしている形のようです。

これらの分割の挙動をカスタマイズするには、以降で述べるService ContextのNodeParser、より具体的にはNodeParserに与えるTextSplitterをカスタマイズする必要があります。

Service Contextについて

次に、Service Contextについて見ていきましょう。

Service Contextは以下のページに引数として与えられるクラスから、その構成を見ることができます。

これによればService Contextには以下が含まれていそうです。

NodeParser : テキストをチャンクに分割してノードを作成する
Embeddings : テキストを埋め込みベクトルに変換する
LLMPredictor : テキスト応答（Completion）を得るための言語モデル(LLM)処理クラス
PromptHelper : LLM側のトークン数制限に合うようテキストを分割する
CallbackManager : 様々な処理のstart, endでコールバックを設定
LlamaLogger : LLMへのクエリのログを取得するのに使用

なお以下のように属性一覧を確認すると、具体的なオブジェクトが確認できます。

list_index.service_context.__dict__

{'llm_predictor': <llama_index.llm_predictor.base.LLMPredictor at 0x7fa43c52e5f0>,
 'prompt_helper': <llama_index.indices.prompt_helper.PromptHelper at 0x7fa43c52e380>,
 'embed_model': <llama_index.embeddings.openai.OpenAIEmbedding at 0x7fa43c52d960>,
 'node_parser': <llama_index.node_parser.simple.SimpleNodeParser at 0x7fa43c52da50>,
 'llama_logger': <llama_index.logger.base.LlamaLogger at 0x7fa43c52fbb0>,
 'callback_manager': <llama_index.callbacks.base.CallbackManager at 0x7fa43c52dff0>,
 'chunk_size_limit': None}

NodeParser

NodeParserは、テキストをチャンクに分割してノードを作成する部分を担っています。

以下により、今回のサンプルのようなデフォルトの場合は、SimpleNodeParserが設定されていることが分かります。

list_index.service_context.node_parser

<llama_index.node_parser.simple.SimpleNodeParser at 0x7f269b12a0e0>

以下を確認すると、現在NodeParserはこのSimpleNodeParser一種類のようです。

SimpleNodeParserの属性は以下で確認ができます。

list_index.service_context.node_parser.__dict__

{'callback_manager': <llama_index.callbacks.base.CallbackManager at 0x7fa43c52dff0>,
 '_text_splitter': <llama_index.langchain_helpers.text_splitter.TokenTextSplitter at 0x7fa43c52d7b0>,
 '_include_extra_info': True,
 '_include_prev_next_rel': True}

具体的な分割処理は、TextSplitterが担っています。

TextSplitterには以下のような属性が含まれており、カスタマイズのヒントになりそうです。

list_index.service_context.node_parser._text_splitter.__dict__

{'_separator': ' ',
 '_chunk_size': 1024,
 '_chunk_overlap': 200,
 'tokenizer': <bound method Encoding.encode of <Encoding 'gpt2'>>,
 '_backup_separators': ['\n'],
 'callback_manager': <llama_index.callbacks.base.CallbackManager at 0x7fa43c52dff0>}

Embeddings

Embeddingsは、テキストを埋め込みベクトルに変換する部分を担っています。

以下により、今回のサンプルのようなデフォルトの場合は、OpenAIEmbeddingが設定されていることが分かります。

list_index.service_context.embed_model

<llama_index.embeddings.openai.OpenAIEmbedding at 0x7f076a4e27d0>

以下を確認すると、EmbeddingsにはOpenAIEmbeddingとLangchainEmbeddingが使えそうです。

LangchainEmbeddingを使うことで、Hugging Faceのモデルなどより広範なモデルに対応することができるようです。

GPT Index でのHuggingFaceの埋め込みモデルの利用｜npaka

OpenAIEmbeddingの属性は以下で確認ができます。

list_index.service_context.embed_model.__dict__

{'_total_tokens_used': 0,
 '_last_token_usage': 0,
 '_tokenizer': <bound method Encoding.encode of <Encoding 'gpt2'>>,
 'callback_manager': <llama_index.callbacks.base.CallbackManager at 0x7fa43c52dff0>,
 '_text_queue': [],
 '_embed_batch_size': 10,
 'deployment_name': None,
 'query_engine': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>,
 'text_engine': <OpenAIEmbeddingModeModel.TEXT_EMBED_ADA_002: 'text-embedding-ada-002'>,
 'openai_kwargs': {}}

このことから、デフォルトではtext-embedding-ada-002が使われることが分かります。

今回のサンプルのGPTListIndexでは、埋め込みベクトルは使われないのですが、デフォルトの設定はこのようになっていることを確認できました。

LLMPredictor

LLMPredictorはテキスト応答（Completion）を得るための言語モデルの部分を担っています。

以下により、LLMPredictorというクラスが設定されていることが分かります。

list_index.service_context.llm_predictor

<llama_index.llm_predictor.base.LLMPredictor at 0x7f076a4e1db0>

以下によれば、LLMPredictorはLangChainのLLMChainクラスのラッパーという扱いのようです。

Hugging Face用のLLMPredictorとなるHuggingFaceLLMPredictorもあり、デフォルトはStabilityAI/stablelm-tuned-alpha-3bを使用するようになっています。

またMockLLMPredictorというものもあり、コストの見積もりなどに使用できるようです。

LLMPredictorの属性は以下で確認ができます。

list_index.service_context.llm_predictor.__dict__

{'_llm': OpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.0, max_tokens=256, top_p=1, frequency_penalty=0, presence_penalty=0, n=1, best_of=1, model_kwargs={}, openai_api_key=None, openai_api_base=None, openai_organization=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False, allowed_special=set(), disallowed_special='all'),
 'callback_manager': <llama_index.callbacks.base.CallbackManager at 0x7fa43c52dff0>,
 'retry_on_throttling': True,
 '_total_tokens_used': 0,
 'flag': True,
 '_last_token_usage': 0}

LLMとしてlangchain.llms.base.LLMの一種であるOpenAIクラスを含んでおり、こちらが処理の本体を担っています。

type(list_index.service_context.llm_predictor.llm)

langchain.llms.openai.OpenAI

OpenAIクラスの属性は以下で確認ができます。

list_index.service_context.llm_predictor.llm.__dict__

{'cache': None,
 'verbose': False,
 'callbacks': None,
 'callback_manager': None,
 'client': openai.api_resources.completion.Completion,
 'model_name': 'text-davinci-003',
 'temperature': 0.0,
 'max_tokens': 256,
 'top_p': 1,
 'frequency_penalty': 0,
 'presence_penalty': 0,
 'n': 1,
 'best_of': 1,
 'model_kwargs': {},
 'openai_api_key': None,
 'openai_api_base': None,
 'openai_organization': None,
 'batch_size': 20,
 'request_timeout': None,
 'logit_bias': {},
 'max_retries': 6,
 'streaming': False,
 'allowed_special': set(),
 'disallowed_special': 'all'}

これらを確認すると、text-davinci-003というモデルがデフォルトで使われることが分かります。

PromptHelper

PromptHelperは、トークン数制限を念頭において、テキストを分割するなどの部分を担っています。

NodeParserと同じようなイメージですが、こちらはLLM側のトークン数制限に合うようにするというのが主な用途となります。

以下により、PromptHelperが設定されていることが分かります。

list_index.service_context.prompt_helper

<llama_index.indices.prompt_helper.PromptHelper at 0x7f076a356200>

以下にその説明があります。

PromptHelperの属性は以下で確認ができます。

list_index.service_context.prompt_helper.__dict__

{'max_input_size': 4097,
 'num_output': 256,
 'max_chunk_overlap': 200,
 'embedding_limit': None,
 'chunk_size_limit': None,
 '_tokenizer': <bound method Encoding.encode of <Encoding 'gpt2'>>,
 '_separator': ' ',
 'use_chunk_size_limit': False}

CallbackManager

LlamaIndexの様々な処理のstart, endでコールバックを設定することができます。

CallbackManagerにCallbackHandlerを設定することで、各CallbackHandlerのon_event_start, on_event_endが発火します。

on_event_startとon_event_endには、それぞれ処理のタイムであるCBEventTypeとpayloadが与えられます。

CBEventTypeの一覧は以下のようになっています。

CBEventType.CHUNKING : テキスト分割処理の前後
CBEventType.NODE_PARSING : NodeParserの前後
CBEventType.EMBEDDING : 埋め込みベクトル作成処理の前後
CBEventType.LLM : LLM呼び出しの前後
CBEventType.QUERY : クエリの開始と終了
CBEventType.RETRIEVE : ノード抽出の前後
CBEventType.SYNTHESIZE : レスポンス合成の前後
CBEventType.TREE : サマリー処理の前後

準備されているCallbackHandlerは、LlamaDebugHandlerのみであり、以下に使用例の記載がありました。

CallbackManagerの属性は以下で確認ができます。

list_index.service_context.callback_manager.__dict__

{'handlers': []}

今回のサンプルのようなデフォルトでは、何も設定されていないようです。

LlamaLogger

LlamaLoggerはあまりドキュメントに記載がないのですが、主にLLMへのクエリのログを取得するのに使用されるようです。

Loggerを設定すると、クエリ実行後にログを取得することができます。

list_index.service_context.llama_logger.get_logs()

Query Engineについて

Query Engineはサンプルのas_query_engineで作成されるクラスで、Storage ContextやService Context以外の設定はここで実施できます。

サンプルでは以下が該当しています。

query_engine = list_index.as_query_engine()

Indexによりインスタンス化されるQuery Engineは異なり、以下にQuery Engineの一覧があります。

今回Indexとして挙げたList Index, Vector Index, Tree Index, Keyword Table Indexは、Retriever Query Engineとなります。

以下で、Query Engineのクラスを確認できます。

query_engine

<llama_index.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x7fb0b041b220>

公式ドキュメントであまり明記されていないのですが、as_query_engineで設定できるものの例として以下が挙げられます。

Retriever Mode : retriever_mode
- Indexのところで述べた通り、Retrieverを切り替え可能
Node Postprocessor : node_postprocessors
- Node抽出後の後処理
- キーワードフィルタなどはここで実現できると考えられる
Optimizer : optimizer
- 各Nodeのテキストに適用したい後処理
Response Mode : response_mode
- レスポンス合成のモード
Prompt Templates : text_qa_template, refine_template, simple_template
- レスポンス合成に必要な各種プロンプトのテンプレート
- Response Modeにより使用するPrompt Templateは異なる

RetrieverQueryEngineの属性は以下で確認ができます。

query_engine.__dict__

{'_retriever': <llama_index.indices.list.retrievers.ListIndexRetriever at 0x7fb0e091a650>,
 '_response_synthesizer': <llama_index.indices.query.response_synthesis.ResponseSynthesizer at 0x7fb0b05297e0>,
 'callback_manager': <llama_index.callbacks.base.CallbackManager at 0x7fb0b0528dc0>}

Retriever以外の処理本体はResponseSynthesizerが担っています。

更に、ResponseSynthesizerの属性は以下です。

query_engine._response_synthesizer.__dict__

{'_response_builder': <llama_index.indices.response.response_builder.CompactAndRefine at 0x7fb0dbc35f00>,
 '_response_mode': <ResponseMode.COMPACT: 'compact'>,
 '_response_kwargs': {},
 '_optimizer': None,
 '_node_postprocessors': [],
 '_verbose': False}

Response Modeは、ResponseMode.COMPACTが指定され、OptimizerやNode Postprocessorはデフォルトでは指定されないことが分かります。

Response Modeごとに使用されるResponseBuilderが変わるようで、今回はCompactAndRefineというResponseBuilderが使用されます。

CompactAndRefineの属性は以下です。

query_engine._response_synthesizer._response_builder.__dict__

{'_service_context': ServiceContext(llm_predictor=<llama_index.llm_predictor.base.LLMPredictor object at 0x7fb0b052ac50>, prompt_helper=<llama_index.indices.prompt_helper.PromptHelper object at 0x7fb0b052a860>, embed_model=<llama_index.embeddings.openai.OpenAIEmbedding object at 0x7fb0b052a110>, node_parser=<llama_index.node_parser.simple.SimpleNodeParser object at 0x7fb0b052a6e0>, llama_logger=<llama_index.logger.base.LlamaLogger object at 0x7fb0b0528580>, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x7fb0b0528dc0>, chunk_size_limit=None),
 '_streaming': False,
 'text_qa_template': <llama_index.prompts.prompts.QuestionAnswerPrompt at 0x7fb0b1792b60>,
 '_refine_template': <llama_index.prompts.prompts.RefinePrompt at 0x7fb0b15ca050>}

ここでtext_qa_templateはQuestionAnswerPromptクラスが設定され、refine_templateにはRefinePromptクラスが設定されていることが分かります。simple_templateはResponseMode.COMPACTでは使用されないのですが、デフォルトではSimpleInputPromptクラスが設定されるようです。

Retriever Mode

Indexのところで述べたためここでは割愛します。

Node Postprocessor

Retrieverにより抽出されたノードについて、後処理を行います。

以下に準備されているPostprocessorの一覧があります。

Optimizer

Nodeのテキストに適用したい後処理を行います。

以下に準備されているOptimizerの一覧があります。

Response Mode

LLMからのレスポンスを合成するモードを設定します。

公式ドキュメントに明記がないのですが、以下に設定可能な一覧があります。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/indices/response/type.py#L4

class ResponseMode(str, Enum):
    """Response modes."""

    REFINE = "refine"
    COMPACT = "compact"
    SIMPLE_SUMMARIZE = "simple_summarize"
    TREE_SUMMARIZE = "tree_summarize"
    GENERATION = "generation"
    NO_TEXT = "no_text"

ResponseMode毎に使用するResponse Builderが変わってきます。またその際に必要なPrompt Templateも異なります。

プロンプトを調整したい場合は、この辺りを意識しておく必要があります。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/indices/response/response_builder.py#L579

def get_response_builder(
    service_context: ServiceContext,
    text_qa_template: Optional[QuestionAnswerPrompt] = None,
    refine_template: Optional[RefinePrompt] = None,
    simple_template: Optional[SimpleInputPrompt] = None,
    mode: ResponseMode = ResponseMode.COMPACT,
    use_async: bool = False,
    streaming: bool = False,
) -> BaseResponseBuilder:
    text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT
    refine_template = refine_template or DEFAULT_REFINE_PROMPT_SEL
    simple_template = simple_template or DEFAULT_SIMPLE_INPUT_PROMPT
    if mode == ResponseMode.REFINE:
        return Refine(
            service_context=service_context,
            text_qa_template=text_qa_template,
            refine_template=refine_template,
            streaming=streaming,
        )
    elif mode == ResponseMode.COMPACT:
        return CompactAndRefine(
            service_context=service_context,
            text_qa_template=text_qa_template,
            refine_template=refine_template,
            streaming=streaming,
        )
    elif mode == ResponseMode.TREE_SUMMARIZE:
        return TreeSummarize(
            service_context=service_context,
            text_qa_template=text_qa_template,
            refine_template=refine_template,
            streaming=streaming,
            use_async=use_async,
        )
    elif mode == ResponseMode.SIMPLE_SUMMARIZE:
        return SimpleSummarize(
            service_context=service_context,
            text_qa_template=text_qa_template,
            streaming=streaming,
        )
    elif mode == ResponseMode.GENERATION:
        return Generation(
            service_context=service_context,
            simple_template=simple_template,
            streaming=streaming,
        )
    else:
        raise ValueError(f"Unknown mode: {mode}")

デフォルトはResponseMode.COMPACTで、最初のノードに対してtext_qa_templateで動作し、その後は前の結果を使いながらrefine_templateで動作します。

Prompt Templates

以下のように３種類のテンプレートが設定できます。

text_qa_template : QuestionAnswerPromptクラス（デフォルト : DEFAULT_TEXT_QA_PROMPT）
refine_template : RefinePromptクラス（デフォルト : DEFAULT_REFINE_PROMPT_SEL）
simple_template : SimpleInputPromptクラス（デフォルト : DEFAULT_SIMPLE_INPUT_PROMPT）

デフォルトのプロンプトは以下で確認ができます。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/prompts/default_prompts.py

この中から今回関連しそうなプロンプトを見ていきます。

QuestionAnswerPrompt

こちらはシンプルに以下のようなコンテキストに対して回答をもとめるようなプロンプトとなっています。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/prompts/default_prompts.py#L105

DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)
DEFAULT_TEXT_QA_PROMPT = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)

RefinePrompt

こちらは多少複雑になっており、LLMがチャットモデルかどうかで挙動を変えるため、ConditionalPromptSelectorによりラップされています。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/prompts/default_prompt_selectors.py#L14

DEFAULT_REFINE_PROMPT_SEL_LC = ConditionalPromptSelector(
    default_prompt=DEFAULT_REFINE_PROMPT.get_langchain_prompt(),
    conditionals=[(is_chat_model, CHAT_REFINE_PROMPT.get_langchain_prompt())],
)
DEFAULT_REFINE_PROMPT_SEL = RefinePrompt(
    langchain_prompt_selector=DEFAULT_REFINE_PROMPT_SEL_LC
)

デフォルト（is_chat_modelがFalse）の場合、以下が使用されます。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/prompts/default_prompts.py#L90

DEFAULT_REFINE_PROMPT_TMPL = (
    "The original question is as follows: {query_str}\n"
    "We have provided an existing answer: {existing_answer}\n"
    "We have the opportunity to refine the existing answer "
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "Given the new context, refine the original answer to better "
    "answer the question. "
    "If the context isn't useful, return the original answer."
)
DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)

チャットモデルの場合は以下が使用されます。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/prompts/chat_prompts.py#L12

# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
    HumanMessagePromptTemplate.from_template("{query_str}"),
    AIMessagePromptTemplate.from_template("{existing_answer}"),
    HumanMessagePromptTemplate.from_template(
        "We have the opportunity to refine the above answer "
        "(only if needed) with some more context below.\n"
        "------------\n"
        "{context_msg}\n"
        "------------\n"
        "Given the new context, refine the original answer to better "
        "answer the question. "
        "If the context isn't useful, output the original answer again.",
    ),
]


CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)

LLM自体がLangChainのクラスを使っているため、PromptにもLangChainが見え隠れしています。

OpenAIのAPIがLLMによって少し異なるため、吸収するための工夫からOSS開発の大変さがよくわかります。

SimpleInputPrompt

こちらはシンプルにコンテキスト等もなく、クエリをそのまま送るプロンプトとなっています。

https://github.com/jerryjliu/llama_index/blob/v0.6.8/llama_index/prompts/default_prompts.py#L296

DEFAULT_SIMPLE_INPUT_TMPL = "{query_str}"
DEFAULT_SIMPLE_INPUT_PROMPT = SimpleInputPrompt(DEFAULT_SIMPLE_INPUT_TMPL)

カスタマイズ用のサンプルコード

Query Engine関連の処理も明示的に書き直したサンプルコードは以下になりました。

（一部チャットモデルに対応するConditionalPromptSelector部分のみ煩雑なため省略しています）

from llama_index import StorageContext
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.vector_stores import SimpleVectorStore

# Storage Contextの作成
storage_context = StorageContext.from_defaults(
    docstore=SimpleDocumentStore()
    , vector_store=SimpleVectorStore()
    , index_store=SimpleIndexStore()
)

from llama_index import ServiceContext
from llama_index.node_parser import SimpleNodeParser
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index import LLMPredictor
from llama_index.indices.prompt_helper import PromptHelper
from llama_index.logger.base import LlamaLogger
from llama_index.callbacks.base import CallbackManager

# Service Contextの作成
llm_predictor = LLMPredictor()
service_context = ServiceContext.from_defaults(
    node_parser=SimpleNodeParser()
    , embed_model=OpenAIEmbedding()
    , llm_predictor=llm_predictor
    , prompt_helper=PromptHelper.from_llm_predictor(llm_predictor)
    , llama_logger=LlamaLogger()
    , callback_manager=CallbackManager([])
)

# Index作成時にContextを入れる
list_index = GPTListIndex.from_documents(
    documents
    , storage_context=storage_context
    , service_context=service_context
)

from llama_index.indices.list.base import ListRetrieverMode
from llama_index.indices.response import ResponseMode
from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt, SimpleInputPrompt
from llama_index.prompts.default_prompts import DEFAULT_TEXT_QA_PROMPT_TMPL, DEFAULT_SIMPLE_INPUT_TMPL, DEFAULT_REFINE_PROMPT_TMPL

query_engine = list_index.as_query_engine(
    retriever_mode=ListRetrieverMode.DEFAULT
    , node_postprocessors=[]
    , optimizer=None
    , response_mode=ResponseMode.COMPACT
    , text_qa_template=QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
    , refine_template=RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
    , simple_template=SimpleInputPrompt(DEFAULT_SIMPLE_INPUT_TMPL)
)

# これ以降は同じ
response = query_engine.query("機械学習に関するアップデートについて300字前後で要約してください。")

for i in response.response.split("。"):
    print(i + "。")

まとめ

いかがでしたでしょうか。

初回から重たい内容となってしまいましたが、基礎として登場する用語やブラックボックス化してしまいがちな処理を洗い出せたかなと考えています。

これ以降の記事では、この基礎をベースにしてどのようなユースケースとカスタマイズが考えられるのかを記事にしていこうと思います。

本記事をベースにカスタマイズにぜひチャレンジして見てください。（そしてそれをシェアしてくださったらとても喜びます）

本記事が、今後LlamaIndexをお使いになられる方の参考になれば幸いです。