Hello,
I have an ID field with very high cardinality, currently implemented as a string, containing content similar to a GUID.
I wish to perform terms aggregations on a large data, and want to optimize this.
I read this article that discusses ordinals and was wondering:
If I change the field implementation to a long, would that help in terms of query speed / memory usage / anything?
Hi Boaz, thanks for the info.
I will look into the formatting of whatever type I choose. I see precision_step is only for Elasticsearch 2.0+. Are there any recommendations for v1.7?
Also, I'm still wondering about this (from the link I posted):
Can switching to a numeric type help the performance of my query as well?
P.S. It's important to note I'm doing terms aggregation on a contextual ID field that is shared between multiple records (i.e. "session_id"), not on the unique document ID itself, if that matters.
Thanks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.