Can ELSER generate Elasticsearch aggregation queries from natural-language input?

Hi team,

I’m exploring a hybrid setup where LLMs (like GPT-4 or GPT-5) convert natural-language questions into Elasticsearch DSL queries — including aggregations such as avg, terms, and date_histogram.

However, I’m facing a key issue:

The LLMs often generate inaccurate or incomplete queries — for example, using the wrong field names, missing .keyword suffixes, or mismatching numeric vs. keyword fields — which causes query failures or incorrect results.

I’m already using ELSER for semantic retrieval (text_expansion queries), and I’m wondering if ELSER could help bridge the gap for natural-language analytics, e.g., by:

  • Providing better context or embeddings for LLMs to ground queries on actual index fields

  • Helping interpret the semantic meaning of text fields before forming aggregations

  • Or being integrated into the LLM prompt to guide query formation more accurately

So my questions are:

  1. Can ELSER itself help generate or validate aggregation queries based on natural language?

  2. Is there a recommended approach to combine ELSER with LLMs to improve the accuracy of NLP-to-Elasticsearch query generation (especially for analytics use cases)?

  3. Are there best practices for grounding LLMs on the actual index mapping so that generated DSL aligns with field types and naming conventions?

Any guidance or examples would be greatly appreciated.

Thanks!