Standardized Way to Provide Additional Mapping Context for Natural Language to Elasticsearch Query Conversion Using LLM

I am implementing a "chat with Elasticsearch index" feature, where a large language model (LLM), such as one from OpenAI, converts natural language queries (NLQs) into Elasticsearch queries. For accurate query conversion, the LLM needs to understand the Elasticsearch index mappings, including the structure and constraints of the fields. However, Elasticsearch mappings typically only provide field types (e.g., keyword, text, date) without additional context.

For example, consider an index with a “status” field of type keyword that accepts only specific values: "Approved", "Denied", and "Draft". These valid values cannot be specified in the standard mapping JSON, so the LLM cannot infer them. For a user query like "get approved documents," how can the LLM know to generate an Elasticsearch query that matches the case-sensitive value "Approved"?

One potential solution is to add detailed metadata to the _meta field in the Elasticsearch mappings. However, including such detailed metadata for all properties could significantly increase the size of the mappings JSON, potentially doubling it, which may impact performance or maintainability.

My questions are:

  1. Is updating the _meta field in the mappings the only standard way to provide the LLM with sufficient context for accurate NLQ-to-query conversion?

  2. Alternatively, should field descriptions and constraints (e.g., valid values) be provided as separate context outside the mappings? Is this a good practice?

  3. If using the GitHub - elastic/mcp-server-elasticsearch (which provides a get_mappings tool for retrieving mappings), how can additional field descriptions be incorporated? Would they need to be included in the LLM prompt separately or the _meta of the mappings to be updated? Which approach is better?

I’d appreciate insights or best practices for providing LLMs with the necessary context to generate accurate Elasticsearch queries.

@NavaneethaKannan

_meta is the only standard place to add extra hints (e.g., enums, descriptions). To avoid bloating the mapping, many teams keep a separate “data dictionary” (JSON/aux index) and pass that as context to the LLM. In MCP, get_mappings already returns _meta; if you use external info, you need to inject it into the prompt or expose it via another tool.

Thanks