ILM, Fielddata memory error on Warm nodes

Hello

I'm trying to understand something in the logic of Elasticsearch.

Here is my setup :
Graylog 4.2 cluster for ingesting logs
Connected to an ES 7.10 cluster
The Graylog cluster knows the ES Cluster by the Master nodes who are set in the Graylog's config files (as "discovery.seed_hosts" for thoses who knows Graylog).

graylog to ES arch

I have an index rotation based on time, one indice per week.
With ILM I have 4 weeks of logs on the Hot data nodes.
After 4 weeks the indices are moved to the Warm nodes.
On the Warm nodes, 22 weeks are keeped, the older are dropped.

In the indice template, the "source" field who contain the hostname who sent the log is set as "fieldata"

"source": {
  "fielddata": true,
  "analyzer": "analyzer_keyword",
  "type": "text"
},

Some example of hostnames I have in the "source" field :

pirv-siem-relay-01
dirv-monitoring-centreon-02
eidv-vrouter-unitary11-ma-02

My question is the following :
When I run a dashboard in Graylog on the LAST 5 MINUTES of logs, which is doing aggregration based on the "source" field I have a fielddata memory error on the Warm nodes :

[2022-05-03T18:17:17,946][WARN ][o.e.i.b.fielddata ] [eirv-siem-es-07] [fielddata] New used memory 7165570335 [6.6gb] for data of [source] would be larger than configured breaker: 6871947673 [6.3gb], breaking

If I enlarge the HEAP size only on the Warm nodes, the error disappears.

I understand the error, but what i don't understand is why the error is happening on the Warm nodes, for a request that are only searching in the last 5 minutes of logs.
The request is not supposed to run only on the Hot nodes ?

Thanks for reading.

Hi @benoitp

Just a couple thoughts ...

If you have fields that like source that you describe that you want to aggregate on .. they should be of type keyword not text that would probably solve this problem.

Think of keywords as fields you want to aggregate on or filter on or both... exact matches etc, categories, hostname, account id, host type, os type etc.. etc.. keywords are extremely efficient for this types of data and operations

Think of text as a free text search like on a description or a message etc.

Using a text data types to aggregate on is a not a best practice for Elasticsearch

2nd... Sending all your queries and / or ingest through your master nodes.. if that is really what the picture is showing is also a bit of an anti pattern.. the ingest / queries should be pointing at your hot data nodes ... or better yet you could create ingest / coordinator nodes and point all your ingest / queries to them...
Going through you masters will work, but all they do is turn around and forward those request to a data node and you are using some valuable resources on the master nodes.

Hi @stephenb

When using a field type of keywords , can I search or do aggregation for a only a part of the string ?
Like "eidv-vrouter" or "eidv-vrouter*" to get all the messages from a group of sources ?
As you are talking about "exact matches" I'm wondering about this.
I read the doc but I didn't find a precise answer. :worried:

About the master nodes, indeed I have a lot of traffic on it and I thought it was the normal behavior. I will dig the idea of the ingest / coordinator nodes.

Thank you for sharing your thoughts.

Yes you can... There are a couple ways here is one, but it also depends on overall what you are trying to do.. you can you field data it is just going to require a lot of memory and not be that efficient

Here is one way this searches for sources that begin with eidv and then aggregates the results

GET filebeat-*/_search
{
  "size": 0, 
  "query": {
    "prefix": {
      "source": {
        "value": "eidv"
      }
    }
  },
  "aggs": {
    "apps": {
      "terms": {
        "field": "source",
        "size": 10
      }
    }
  }
}

You could also use wildcard

GET filebeat-*/_search
{
  "size": 0, 
  "query": {
    "wildcard": {
      "source": {
        "value": "eidv*"
      }
    }
  },
  "aggs": {
    "apps": {
      "terms": {
        "field": "source",
        "size": 10
      }
    }
  }
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.