Elastic Search with ChatGPT

Pey · January 16, 2024, 12:32pm

Hi, I am very interested in combining the usage of GPT + Elasticsearch for enterprise data (link: ChatGPT and Elasticsearch: OpenAI meets private data — Elastic Search Labs). We have used Azure Cognitive Search + Open AI GPT but faced many implementation issues. For instance, all our documents in Microsoft Words have to be converted into PDF before chunking can take place and we have to use chunking overlap to ensure context linkage between pages.

Before we deep dive further into Elastic Search solution. Can I ask Elasticsearch has similar limitation?

Sean_Story · January 16, 2024, 5:27pm

Hi @Pey !

I am very interested in combining the usage of GPT + Elasticsearch for enterprise data

Fantastic! This is a use case we're very focused on right now. Looks like you'e found one of our relevant blogs - you may want to also read Chunking Large Documents via Ingest pipelines plus nested vectors equals easy passage search — Elastic Search Labs, which offers some ideas on how to approach chunking with Elastic stack components. Privacy-first AI search using LangChain and Elasticsearch — Elastic Search Labs also provides guidance on how to use tools like LangChain to chunk your data.

For instance, all our documents in Microsoft Words have to be converted into PDF before chunking

That's odd. I think in our ecosystem, it's more common that you'd want to convert documents to plain-text first. The Attachment Processor can do that for you in an ingest pipeline, or you can use tools like pandoc or Apache Tika to do that outside our ecosystem.

we have to use chunking overlap to ensure context linkage between pages.

This is more of a problem-space pattern than a stack-enforced task. If you don't have overlap in your chunks but your query needs context from two non-overlapping chunks, you won't get a hit. It's a tradeoff you'll have to consider regardless of the tech you use, whether the cost of the extra inference is worth the improved relevance. You have lots of knobs you can turn here, like how big of chunks to make, how much overlap they should have, now many max chunks to make, and even if you want to chunk large text content at all, or first summarize the large content with an LLM, then apply the semantic text model to just the summary.

Hopefully this is helpful.

system · February 13, 2024, 5:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Please let me know if i can use elasticsearch for text search in pdf and word documents Elasticsearch	6	520	July 5, 2017
Infrastructure people pushing google search on us:-( any help defending ES would be appreciated! Elasticsearch	12	754	July 6, 2017
Automatic Keywords extraction in ElasticSearch Elasticsearch	15	6498	July 6, 2017
Disabling _source field Elasticsearch	22	2085	July 6, 2017
Possible to Index PDFs by page? Elasticsearch	6	3837	July 6, 2017

Elastic Search with ChatGPT

Related topics