_id is consuming a lot of the fielddata memory

aryon · October 30, 2019, 6:25pm

Hello,

I have a 3 node cluster (16 vCPU, 64 GB of RAM, 3 Tb of data per node, JVM Heap at 30GB) with 450 indices (1 primary shard and 1 replica per indice).

Following an upgrade from 6.7 to 6.8, the activation of TLS on Transport and HTTP and the activation of security (native authentication), we started seeing circuit breaking exceptions in the elastic logs.
After some investigations I found out that the JVM Heap is mainly used by fielddata, and most of the fielddata memory is used by the "_id" field :

GET _cat/fielddata?v&fields=*&s=size

-QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA type.raw                                5kb
JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC type.raw                              5.2kb
Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB type.raw                              6.7kb
-QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA shard.state                             7kb
Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB shard.state                           8.1kb
JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC shard.index                          21.9kb
Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB src_ip                               41.5kb
-QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA shard.index                          41.7kb
Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB shard.index                          42.5kb
-QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA src_ip                               97.8kb
JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC src_ip                              103.2kb
-QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA _id                                  23.9gb
Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB _id                                    24gb
JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC _id                                    24gb

Is this a normal behavior ? How can I decrease the memory used ?

The third node was added to the cluster recently to try to split the load but it doesn't change anything.
I have a lot of fields in my indexes, would decreasing the number of fields change that ?

Thanks

Antoine

DavidTurner · October 30, 2019, 7:00pm

Have you used the _id field for sorting or aggregations? If so, it's recommended not to do that.

aryon · October 30, 2019, 7:30pm

Hi David,

We checked our searches, visualizations and dashboards and didn't find any sorting or aggregations using the _id field in them.
We are using ElastAlert https://github.com/Yelp/elastalert to query the logs we are ingesting in ElasticSearch and none of our ElastAlert rules are using it either.

aryon · October 30, 2019, 7:37pm

Also, I don't know if this information will be of any use, but when I restart the node it takes a while for the _id fielddata memory to build up.
I'm restarting each node twice a day to free the memory

DavidTurner · October 30, 2019, 7:48pm

Does the buildup correspond with shards being allocated to the node after its restart, or does it take longer than the shard allocation?

aryon · October 30, 2019, 7:56pm

It takes longer, almost an hour and a half from what I can say from the monitoring graphs

DavidTurner · October 31, 2019, 5:42am

This is consistent with using the _id field in sorting or aggregations.

I don't have any great ideas for tracking down the source of those searches. Maybe a good start would be to use the slow log to log all searches.

Does it only happen with certain indices?

aryon · October 31, 2019, 3:54pm

We found what causes the issue : when an ElastAlert rule matches, we add a link to Kibana in the alert with the _id of the log that matched the rule. When someone clicks on the link, _id values are loaded in the JVM Heap.
I don't think that copying the _id value in another field would change that as we have a lot of logs, at some point Elasticsearch will have to load these values to search in them. And doc_values would require to read this information from the disk so I guess performance will not be great either.

DavidTurner · October 31, 2019, 4:01pm

What happens from Elasticsearch's point of view between "someone clicks on the link" and "_id values are loaded in the JVM heap"? A search?

I recommend validating guesses of that nature with a proper experiment.

DavidTurner · November 1, 2019, 8:28am

TIL we have an API for clearing caches which includes field data. It isn't a good long-term fix but it is a lot less disruptive than restarting nodes to clear this memory usage.

aryon · November 5, 2019, 12:26pm

Hi David,

The link goes to the Discover tab in Kibana so yes a search is performed at that time.

I configured the "Logs" app in Kibana to display the logs that are in our logstash-* indexes and performed a few search on _id, I do not have the issue that way so we will modify the links in ElastAlert to use this app.

aryon · November 5, 2019, 12:28pm

Thanks David, I used it a few times and it worked great ! Yes indeed it is a lot less disruptive and also a lot quicker than restarting the nodes.

system · December 3, 2019, 12:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Identiying how Memory is being used in ES Elasticsearch	2	535	June 10, 2017
ELK suddenly colapsed Elasticsearch	13	2537	July 5, 2017
Finding Heap Memory Circuit Breaker hard to predict Elasticsearch	7	1525	July 5, 2017
"failed shard on node... ...Data too large, data for [<transport_request>] would be" only for 3 most recent .monitoring-es indices Elasticsearch	9	5091	March 26, 2020
Does this mean my "_id" field is taking up GB of RAM? Elasticsearch	4	375	April 6, 2023

_id is consuming a lot of the fielddata memory

Related topics