Feedback needed on my query

amanda-lamancha · July 18, 2017, 5:56pm

Hi Elasticsearch community,

I'm trying to write some pretty straightforward documentation about elasticsearch for my team, and I'm very new to using it myself. I'm using this query as an example. I'm trying to filter in a few different ways (term, wildcard, range). I want to be in filter context, not query context because scoring doesn't matter. The first date range clause is to take advantage of caching, because this would (theoretically) run every two minutes for monitoring purposes. Any feedback on anything weird I am doing would be very appreciated, I don't want to lead anyone astray. Thanks!

GET /env_cc-*/_search
{
"query": {
   "bool": {
     "filter": [
        { "range": { "dateTime": { "gte": "now-1h/d" }}}
        ,{ "wildcard": { "sensor_type": { "value": "?_AIR_TEMP?"}}}
        ,{ "range": { "dateTime": { "gte": "now-2m"}}}
        ,{ "term": {"cname": {"value": "a1"}}}
      ]
    }
  }
}

polyfractal · July 21, 2017, 1:50pm

It looks reasonable to me

The only potentially hairy bit is that wildcard. The leading ? will force the query to do essentially a table-scan over all available characters. It's not as bad as a leading wildcard, which expands out to essentially every possible document in the index, but it will still be relatively expensive.

Is the leading/trailing ? really needed? How is sensor_type analyzed?

If the leading ? is needed, there's a trick you can do to help speed up the query (if it proves too slow). Add a multifield to the analyzer that uses a reverse token filter, then an ngram token filter. This will index _AIR_TEMP_ as ["_", "_P", "_PM", "_PME", "_PMET", ...]. Then when you search, include a query against both the forward and reverse field, which gives you essentially prefix and suffix search.

It's faster because it is indexing the prefix fragments directly into the datastructure, rather than doing the same thing at query-time. And because the reversed prefix search is indexed, it doesn't have to do a full table scan to find matching characters.

Feel free to ignore that tip if performance is fine. It may be something to file away for later when your data volume grows and you need to squeeze a bit more performance out of things.

amanda-lamancha · July 21, 2017, 9:08pm

Thank you so much for the feedback! That is really good to know.

system · August 18, 2017, 9:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Better way to do a wildacard search - prefix wildcard or match_phrase_prefix? Elasticsearch	5	4786	February 20, 2018
Leading wildcard search handling Elasticsearch	3	4329	May 2, 2017
Wildcard query on keyword vs N-gram analyzer + multi-match Elasticsearch	1	272	March 23, 2024
Slow Query Performance Elasticsearch	2	74	October 21, 2024
Filtering for wildcard domains Elasticsearch	4	708	September 8, 2021

Feedback needed on my query

Related topics