I was wondering what the communities opinion was on best practices for generating queries in a complex application. For our date histogram queries we have tried both:
Store DSL queries as text files, use string manipulation to substitute in some variables
Generate DSL queries at runtime using the SDK
I found pros and cons of both approaches: #1 was easier to develop and debug since queries can easily be pasted into Postman or ElasticVue, but #2 was easier to test and re-use code across indexes.
that's a great question, and I tend to go back and forth between. There is no concrete option - to me it boils down to the experience and exposure of Elasticsearch DSL to the developers maintaining that application.
The more experienced, the easier it is to read the query DSL and be able to write queries in a programming language, as well as reading them when working on the code.
In other cases, where I wanted to be sure others are feeling home with the code and that query won't change super often, it might make sense to off load this into a file.
There is a caveat with the second solution: You need to ensure, that the query products not only valid JSON, but also a valid Elasticsearch query, so it checking on startup for valid JSON and exiting early if that is not the case for any of the stored queries makes sense as well as ensuring that your tests are doing integration tests.
There is a third option, and that would be the use of search templates, where the query itself is stored in Elasticsearch, and you're only passing the parameters that change for each search request to Elasticsearch. This might make sense for big queries and also allows to replace the underlying implementation without making changes to your app - that might make sense if you test out different searches a lot or make some A/B experiments. See Search templates | Elasticsearch Guide [8.1] | Elastic
In a statically typed language I would go always with the first option, knowing to produce a valid query and feeling comfortable with DSL myself.
Thank you for your thoughts Alex, they align closely with how I was framing this in my head. We started off using query files when we were prototyping and not that familiar with the DSL, but now that those queries are stable and in production (and we want to run more variants of them) it made sense to us to migrate to using the query builders available in the SDK. As we had up to 10 or 12 files that all had similar syntax for generating histograms from various indexes with minor differences.
We weren't aware of search templates so I'll definitely give that a try. Are there any performance benefits to gain from those relative to using the SDK method?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.