Using elastic search with langchain

This category relates to the Enterprise Search set of products - App Search, Site Search and Workplace Search.
If your question relates to core Elasticsearch functionality, please head over to the Elasticsearch category for assistance.

I am using in python the libraries langchain_elasticsearch to implement a MultiQueryRetriever. from the code seen below.

 from langchain_elasticsearch import ElasticsearchStore
 from langchain.retrievers.multi_query import MultiQueryRetriever
 vectorstore = ElasticsearchStore(
    embedding=embeddings,
    vector_query_field = "query_embeddings",
    query_field = "query_field",
    index_name="index_with_embeddings",
    es_connection=es_client
 )
 retriever = MultiQueryRetriever( retriever=vectorstore.as_retriever(), llm_chain=llm_chain, parser_key="lines")
unique_docs = retriever.get_relevant_documents(query='here is the question?')

The mapping of the index is the following:

 "index_with_embeddings": {
        "mappings": {
            "dynamic": "strict",
            "properties": {
                "CreateTimeStamp": {
                    "type": "date"
                },
                "query_embeddings": {
                    "type": "dense_vector",
                    "dims": 1536,
                    "index": true,
                    "similarity": "cosine"
                },
                "query_field": {
                    "type": "text"
                },
                "Sequence": {
                    "type": "integer"
                },
                "Field_add1": {
                    "type": "text"
                },
                "Field_add2": {
                    "type": "text"
                }
            }
        }
    }

The error I got in python is the following:

[Lib\site-packages\langchain\retrievers\multi_query.py:175](file://.venv/Lib/site-packages/langchain/retrievers/multi_query.py:175),
in MultiQueryRetriever._get_relevant_documents(self, query, run_manager) 

[173](file://.venv/Lib/site-packages/langchain/retrievers/multi_query.py:173) 
if self.include_original:

[807](file://.venv/Lib/site-packages/langchain_elasticsearch/vectorstores.py:807) 
page_content=hit["_source"].get(self.query_field, ""), 
[808](file://.venv/Lib/site-packages/langchain_elasticsearch/vectorstores.py:808)
metadata=hit["_source"]["metadata"],

Does anybody know how this problem can be fixed to use an existing index with the definition mentioned above?

The current notebooks in github are precarious and they don't give details about existing indexes. (here is the link ) elasticsearch-labs/notebooks/langchain at main · elastic/elasticsearch-labs · GitHub

Do the indexes must follow the schema of the "langchain schema document"? Is there any way to use an existing index that does have a different structure?

This line in the error makes me think this might be the same error reported here: langchain-elastic #7 where the default_doc_builder is expecting there to be a metadata key on the document. If it's missing there is a KeyError.

I'm looking into resolving this, but for now you could try to add an empty metadata: {} field to your documents and see if it resolves the error.

1 Like

Also to mention, the vector_store setup the indices for you, as you ingest through LangChain's own loaders Document loaders | 🦜️🔗 LangChain

For those who have ingested data externally and simply want to query, we advise using the ElasticsearchRetriever which might fit your needs better.

Joe

1 Like

Hi Joseph,

Thanks for your suggestion of using Elasticsearchretriever, it seems a better solution for existing indexes, and I tested with several queries and it worked.

However, I tried to run a llm_chain to use the MultiQueryRetriever, but when I passed the Elasticsearchretriever to instantiate the MultiQueryRetriever I am getting the following error.

def vector_query(search_query: str) -> dict:
    vector = embeddings.embed_query(search_query)  # same embeddings as for indexing
    return {
        "knn": {
            "field": "Embeddings",
            "query_vector": vector,
            "k": 5,
            "num_candidates": 10,
        }
    }

index_name = "index_wiht_embeddings"
vector_retriever = ElasticsearchRetriever.from_es_params(
    index_name=index_name,
    body_func=vector_query,
    content_field="Sentence",
    url=f"https://{ELASTIC_URL}:443",
    username = ELASTIC_USER,
    password = ELASTIC_PASSWORD     
)
question = "Who are the people who have knowledge in Elasticsearch"

output_parser = LineListOutputParser()
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines, and do not number the response, or add any character at the beggining.
    Original question: {question}""",
)

llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

and once I created the Multiquery here is the error.

retriever = MultiQueryRetriever(retriever=vector_retriever, llm_chain=llm_chain, parser_key="lines")  
unique_docs = retriever.invoke(query=question)
len(unique_docs)

Error:

TypeError                                 Traceback (most recent call last)
Cell In[51], line 2
      1 retriever = MultiQueryRetriever(retriever=vector_retriever, llm_chain=llm_chain, parser_key="lines")  
----> 2 unique_docs = retriever.invoke(query=question)
      3 len(unique_docs)

TypeError: BaseRetriever.invoke() missing 1 required positional argument: 'input'

My question is if there any way o pass Elasticsearchretriever as a retriever to the Multiqueryretriever ?

Thanks beforehand,
David

Hi Rodney,

I will try the solution you proposed later, because it seems more appropriated to test the solution suggested by Joseph, that retriever allows me to implement a hybrid search over several fields, so it'd be better to use Elasticsearchretriever.

Cheers,
David

Hey David,

So here is an example of the multi-query using the Vector store in a notebook. elasticsearch-labs/notebooks/langchain/multi-query-retriever-examples/langchain-multi-query-retriever.ipynb at main · elastic/elasticsearch-labs · GitHub

My next task is to try this out with the retriever, to verify if theres any issue here. So will do that and report back!

Joe

1 Like

so it should work, tested it by updating the chatbot notebook

and using the following code for retriever

from langchain_elasticsearch import ElasticsearchRetriever

def vector_query(search_query: str):
    vector = embeddings.embed_query(search_query)  # same embeddings as for indexing
    return {
        "knn": {
            "field": "vector",
            "query_vector": vector,
            "k": 3,
            "num_candidates": 10,
        }
    }


vector_retriever = ElasticsearchRetriever.from_es_params(
    index_name="chatbot-multi-query-demo",
    body_func=vector_query,
    content_field="text",
    # url="http://localhost:9200",
)

retriever = MultiQueryRetriever.from_llm(vector_retriever, llm)
1 Like