Terms aggregation by long faster than by string?

dorr · January 6, 2016, 5:27pm

Hello,
I have an ID field with very high cardinality, currently implemented as a string, containing content similar to a GUID.
I wish to perform terms aggregations on a large data, and want to optimize this.

I read this article that discusses ordinals and was wondering:
If I change the field implementation to a long, would that help in terms of query speed / memory usage / anything?

Thanks.

bleskes · January 6, 2016, 7:34pm

Internally strings and numbers are treated as bytes. When matters is how the bytes are distributed. Numbers also have the "down side" of being chopped to multiple terms to speed up range searches (see https://www.elastic.co/guide/en/elasticsearch/reference/2.1/precision-step.html ). In general GUIDs are fine, but check this blog for advice on how to optimize them: http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html

dorr · January 7, 2016, 12:05pm

Hi Boaz, thanks for the info.
I will look into the formatting of whatever type I choose. I see precision_step is only for Elasticsearch 2.0+. Are there any recommendations for v1.7?

Also, I'm still wondering about this (from the link I posted):

Can switching to a numeric type help the performance of my query as well?

P.S. It's important to note I'm doing terms aggregation on a contextual ID field that is shared between multiple records (i.e. "session_id"), not on the unique document ID itself, if that matters.
Thanks.

Topic		Replies	Views
Terms Aggregation performance high cardinality Elasticsearch	8	5205	July 5, 2017
Slow terms aggregations after use of eager_global_ordinals Elasticsearch	6	760	November 9, 2020
Search on Long is faster than keyword Elasticsearch	1	456	April 30, 2020
Which of these will be fastest way of querying my data Elasticsearch	1	364	July 6, 2017
Integer Vs Long field comparison Elasticsearch	1	541	May 5, 2019

Terms aggregation by long faster than by string?

Related topics