Hi Elastic folks,
We are seeing significant performance hit stemming from the usage of global ordinals (CPU, heap, GC). All background on our specific situation is below, points in question are:
- Loading of global ordinals seem to sometimes run forever, with no circuit breakers triggered (see numbers below). Is that intentional?
- The docs state "The loading time of global ordinals depends on the number of terms in a field, but in general it is low, since it source field data has already been loaded. The memory overhead of global ordinals is a small because it is very efficiently compressed." (see https://www.elastic.co/guide/en/elasticsearch/reference/current/eager-global-ordinals.html). This doesn't seem to be the case in our use-case - loading times are high and significant memory is consumed. Are the docs wrong, or is this a corner case worth checking in the ES codebase?
- Seems like fields defined with IP datatype in the mapping take even longer to load (10-20% slower) - is there a reason for that? should we use multi-field (IP datatype for search, keyword for aggs)? Can we avoid multi-field?
- Other aggregations, most notably the date histogram aggregation, run perfectly well and very fast also on fields with very high cardinality. Is there a way to use the same technique for terms aggregation as well? in our case approximate counts will be ok too.
- The docs for the
map
execution hint state (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html): "Please note that Elasticsearch will ignore this execution hint if it is not applicable and that there is no backward compatibility guarantee on these hints." . Since the map execution hint is (currently) required for such queries to run, can the docs state those "not applicable" scenarios explicitly? - Can global ordinals be monitored somehow? I couldn't find any obvious metric that would return the amount of memory it consumes, and in our case where significant amount of heap space is consumed, we just attribute it to global ordinals, true or not.
Background:
We have an index with a source_ip
field. This field has a very high cardinality (about 90% of documents count). Tested with several versions of 6.x, latest 6.4.3.
When executing a terms aggregation on this field (mapped as keyword), initial query takes a lot of time to run also on a decent cluster with decent nodes, for a large corpus. A terms agg query on a 400m docs corpus ran for an hour before we killed it; on 70m docs it returned in 15-20 minutes. Subsequent queries ran fast, hence the attribution of latency to loading of global ordinals. Also, using the map execution hint helps.
This is how the search task looks like (got up to 55 minutes, then we killed the search task):
GET _cat/tasks?v
action task_id parent_task_id type start_time timestamp running_time ip node
indices:data/read/search IJJVLCApT_KsA283KwS_oA:481102 - transport 1542704145263 08:55:45 4.7m 10.0.0.11 ip-10-0-0-11
indices:data/read/search[phase/query] RoXZonzjQs-IaFdaIRORww:9599 IJJVLCApT_KsA283KwS_oA:481102 netty 1542704145263 08:55:45 4.7m 10.0.0.33 ip-10-0-0-33
...
...
We observe about 4-5GB of unexplained heap usage - there is no evident monitoring for the space global ordinals occupy and as such we attributes this amount of memory to them.
The same happens when the field is defined as IP datatype, but seems to run even longer (for the 70m case, where it completes). Cardinality aggregation returns exactly the same cardinality value for both keyword and ip variants (on a small dataset with precision_threshold
set to 40k), so doc_values for the IP field seems to be persisted correctly. Also, docvalue_fields
returns the correct expected values.
Machines are i3 instances on AWS, so CPU, memory and especially disk IO are not a concern.
Looking forward for helpful advice,
Itamar