Standardized Way to Provide Additional Mapping Context for Natural Language to Elasticsearch Query Conversion Using LLM

NavaneethaKannan · August 20, 2025, 7:04am

I am implementing a "chat with Elasticsearch index" feature, where a large language model (LLM), such as one from OpenAI, converts natural language queries (NLQs) into Elasticsearch queries. For accurate query conversion, the LLM needs to understand the Elasticsearch index mappings, including the structure and constraints of the fields. However, Elasticsearch mappings typically only provide field types (e.g., keyword, text, date) without additional context.

For example, consider an index with a “status” field of type keyword that accepts only specific values: "Approved", "Denied", and "Draft". These valid values cannot be specified in the standard mapping JSON, so the LLM cannot infer them. For a user query like "get approved documents," how can the LLM know to generate an Elasticsearch query that matches the case-sensitive value "Approved"?

One potential solution is to add detailed metadata to the _meta field in the Elasticsearch mappings. However, including such detailed metadata for all properties could significantly increase the size of the mappings JSON, potentially doubling it, which may impact performance or maintainability.

My questions are:

Is updating the _meta field in the mappings the only standard way to provide the LLM with sufficient context for accurate NLQ-to-query conversion?
Alternatively, should field descriptions and constraints (e.g., valid values) be provided as separate context outside the mappings? Is this a good practice?
If using the GitHub - elastic/mcp-server-elasticsearch (which provides a get_mappings tool for retrieving mappings), how can additional field descriptions be incorporated? Would they need to be included in the LLM prompt separately or the _meta of the mappings to be updated? Which approach is better?

I’d appreciate insights or best practices for providing LLMs with the necessary context to generate accurate Elasticsearch queries.

Rafa_Silva · August 25, 2025, 1:47am

@NavaneethaKannan

_meta is the only standard place to add extra hints (e.g., enums, descriptions). To avoid bloating the mapping, many teams keep a separate “data dictionary” (JSON/aux index) and pass that as context to the LLM. In MCP, get_mappings already returns _meta; if you use external info, you need to inject it into the prompt or expose it via another tool.

Thanks

Topic		Replies	Views
Indexing custom Lucene documents Elasticsearch	6	586	July 6, 2017
Converting from MySQL Elasticsearch	6	352	July 6, 2017
How to query write query for given mapping? Elasticsearch	3	868	July 5, 2017
Mapping modifications in index template breaks search queries Elasticsearch	18	501	July 6, 2017
Guide documentation update requested, for 'index' field in mappings Elasticsearch	3	318	July 6, 2017

Standardized Way to Provide Additional Mapping Context for Natural Language to Elasticsearch Query Conversion Using LLM

Related topics