New elasticsearch node does not run enrichments

FKarraz · April 21, 2023, 1:26pm

Hello, as I mentioned in enrich processor missing documents, I am facing some issues in my elasticsearch cluster related to document enrichment. I'm opening a new thread as I suspect they are different problems.

As the people comment, i upgrade elasticsearch node to 7.17.9 and then i generate a new elasticsearch node to create a cluster. The server on which the nodes run has 48 GB of RAM and 48 CPUs.

Here is the docker-compose configuration:

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
    ports:
      - "9200:9200"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - /etc/my-folder/pm-monitor/elasticsearch/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt
      - /etc/my-folder/pm-monitor/elasticsearch/certs/instance/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt
      - /etc/my-folder/pm-monitor/elasticsearch/certs/instance/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key
      - /data/docker/volumes/elasticsearch_data:/usr/share/elasticsearch/data
      - /data/docker/volumes/elasticsearch_snapshots:/snapshots
      - /etc/my-folder/pm-monitor/elasticsearch/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    labels:
      co.elastic.metrics/module: elasticsearch
      co.elastic.metrics/hosts: "elasticsearch:9200"
      co.elastic.metrics/metricsets: enrich,index,index_summary,node_stats,pending_tasks
    environment:
      #- "discovery.type=single-node"
      - node.name=elasticsearch
      - cluster.initial_master_nodes=elasticsearch
      - xpack.security.enabled=true
      #- xpack.license.self_generated.type=basic
      - "http.cors.allow-origin=http://localhost:1358,http://127.0.0.1:1358,http://some-url:1358"
      - "http.cors.enabled=true"
      - "http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization"
      - "http.cors.allow-credentials=true"
      - "ES_JAVA_OPTS=-Xms14g -Xmx14g"
      - bootstrap.memory_lock=true
    restart: always

  
  elasticsearch02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.9
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - /etc/my-folder/pm-monitor/elasticsearch/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt
      - /etc/my-folder/pm-monitor/elasticsearch/certs/instance/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt
      - /etc/my-folder/pm-monitor/elasticsearch/certs/instance/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key
      - /etc/my-folder/pm-monitor/elasticsearch/elasticsearch02.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    labels:
      co.elastic.metrics/module: elasticsearch
      co.elastic.metrics/hosts: "elasticsearch:9200"
      co.elastic.metrics/metricsets: enrich,index,index_summary,node_stats,pending_tasks
    environment:
      - node.name=elasticsearch02
      - discovery.seed_hosts=elasticsearch
      - "ES_JAVA_OPTS=-Xms14g -Xmx14g"
      - xpack.security.enabled=true
      #- xpack.license.self_generated.type=basic
      - bootstrap.memory_lock=true
      - "http.cors.enabled=true"
      - "http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization"
      - "http.cors.allow-credentials=true"

Here are the elasticsearch.yml configuration files:

Node elasticsearch:

cluster.name: "docker-cluster"
network.host: 0.0.0.0
xpack.security.enabled: true
path.repo: /snapshots
enrich.cache_size: 1500
node.roles: ["data", "data_cold", "data_content", "data_frozen" ,"data_hot", "data_warm", "ingest", "master", "ml", "remote_cluster_client", "transform"]
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/elasticsearch.key
xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/elasticsearch.crt
xpack.security.transport.ssl.certificate_authorities: [ "/usr/share/elasticsearch/config/ca.crt"]

Node elasticsearch02:

cluster.name: "docker-cluster"
network.host: 0.0.0.0
xpack.security.enabled: true
enrich.cache_size: 1500
node.roles: ["data", "data_cold", "data_content", "data_frozen" ,"data_hot", "data_warm", "ingest", "master", "ml", "remote_cluster_client", "transform"]
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.key: /usr/share/elasticsearch/config/elasticsearch.key
xpack.security.transport.ssl.certificate: /usr/share/elasticsearch/config/elasticsearch.crt
xpack.security.transport.ssl.certificate_authorities: [ "/usr/share/elasticsearch/config/ca.crt"]

The cluster is working well but it still having troubles with enrichments, indeed, if a search for enrichments API:

GET _enrich/_stats

{
  "executing_policies" : [ ],
  "coordinator_stats" : [
    {
      "node_id" : "CQzSUS3cRdyI0MC_i-1LnQ",
      "queue_size" : 0,
      "remote_requests_current" : 0,
      "remote_requests_total" : 0,
      "executed_searches_total" : 0
    },
    {
      "node_id" : "mcEo_QXMTbiYporFgWgYgQ",
      "queue_size" : 7,
      "remote_requests_current" : 8,
      "remote_requests_total" : 93557921,
      "executed_searches_total" : 184340292
    }
  ],
  "cache_stats" : [
    {
      "node_id" : "CQzSUS3cRdyI0MC_i-1LnQ",
      "count" : 0,
      "hits" : 0,
      "misses" : 0,
      "evictions" : 0
    },
    {
      "node_id" : "mcEo_QXMTbiYporFgWgYgQ",
      "count" : 1500,
      "hits" : 531829397,
      "misses" : 187088591,
      "evictions" : 124235272
    }
  ]
}

It seems that the new node (CQzSUS3cRdyI0MC_i-1LnQ) is not doing any ingestion tasks.

I don't know if it has something to do with it, but maybe the fact of not generating a volume mapping for the elasticsearch02 node has something to do with it?

Could someone help me with this issue, thank you in advance.

leandrojmp · April 21, 2023, 1:36pm

In your docker compose you didn't expose any port for the second node.

To which node are you directing your indexing requests?

If your indexing requests are being sent only to the elasticsearch node, it is expected that this node will run the ingest pipeline and your enrich processors.

FKarraz · April 21, 2023, 1:50pm

That's right, indexing requests are sent to http://elasticsearch:9200.

In case you expose port 9200 on the new node. To which address should I send the requests to be processed by the corresponding node?

There is something that perhaps I don't quite understand. Regardless of whether the request is to http://elasticsearch:9200. If the corresponding shard is allocated in node02 (elasticsearch02), shouldn't node01 forward the indexing request to node02?

And one more question, suppose I restart the node02 container exposing port 9200. In the elasticsearch02 configuration I do not map the data volumes to the host. If I restart the service, the data of the shards allocated in node02 should not be lost, right?
(All this in a scenario where, due to limited resources, I only have primaries shards configured).

leandrojmp · April 21, 2023, 2:02pm

I'm not sure about the internals, but the indexing request to write on the shard is done after the ingest process, so the data is processed by the ingest pipeline, which will run the enrich processors and after that they will be written on the specific shards.

No, if you do not have a persistent volume for the node02, the data will be lost if you restart it, every data node needs to have persistent volume.

Do not restart your node02 if you have data on it and do not have a persistent volume mounted, you need first to remove the data from the node, remove the node and add a persistent volume for it to work.

FKarraz · April 25, 2023, 7:47pm

Hello, I have already generated the volume and the information is persistent. I have restarted the container by exposing port 9200 (mapped to host port 9203).

Shards are already located in node elasticsearch02 (for example):

GET _cat/shards

.ds-raw_kpi_4g_4-2023.04.24-000075                                 0 p STARTED  13123792    7.6gb 172.18.0.18 elasticsearch02

The index shown above corresponds to the current write index of the raw_kpi_4g_4 datastream. So I suppose that if I run the creation of a new document on that datastream, the elasticsearch02 node should perform the ingest task. But that is not what is happening:

The following command was executed from postman (my-sensitive-url corresponds to server's hostname):

Request:
http://my-sensitive-url:9200/raw_kpi_4g_3/_bulk
{"create":{}}
{my-sensitive-docuemnt}

Response:
{
    "took": 501,
    "ingest_took": 345,
    "errors": false,
    "items": [
        {
            "create": {
                "_index": ".ds-raw_kpi_4g_3-2023.04.23-000092",
                "_type": "_doc",
                "_id": "zw3luYcBVuxEiSCwnukK",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 33796405,
                "_primary_term": 2,
                "status": 201
            }
        }
    ]
}

If a execute the following command, i can see that no enrich tasks were not executed by node elasticsearch02:

GET _enrich/_stats

{
  "executing_policies" : [ ],
  "coordinator_stats" : [
    {
      "node_id" : "QKDOhr5LQ2q3LuUr_3pGDQ",
      "queue_size" : 0,
      "remote_requests_current" : 0,
      "remote_requests_total" : 0,
      "executed_searches_total" : 0
    },
    {
      "node_id" : "mcEo_QXMTbiYporFgWgYgQ",
      "queue_size" : 0,
      "remote_requests_current" : 0,
      "remote_requests_total" : 33199750,
      "executed_searches_total" : 71439959
    }
  ],
  "cache_stats" : [
    {
      "node_id" : "QKDOhr5LQ2q3LuUr_3pGDQ",
      "count" : 0,
      "hits" : 0,
      "misses" : 0,
      "evictions" : 0
    },
    {
      "node_id" : "mcEo_QXMTbiYporFgWgYgQ",
      "count" : 1500,
      "hits" : 201684717,
      "misses" : 72268496,
      "evictions" : 47322471
    }
  ]
}

Thanks in advance.

leandrojmp · April 25, 2023, 7:56pm

To which node this my-sensitive-url points to?

Have you tried to send a indexing request directly to the elasticsearch02 node?

FKarraz · April 25, 2023, 9:05pm

I executed the following command and finally the node elasticsearch02 ingest the document (the port is 9302, is the mapped port for node elasticsearch02):

Request:
http://my-sensitive-url:9203/raw_kpi_4g_3/_bulk
{"create":{}}
{my-sensitive-docuemnt}

Response:
{
    "took": 13,
    "ingest_took": 140,
    "errors": false,
    "items": [
        {
            "create": {
                "_index": ".ds-raw_kpi_4g_3-2023.04.23-000092",
                "_type": "_doc",
                "_id": "Aig2uocBXgfGcqxzAbxb",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 1,
                    "successful": 1,
                    "failed": 0
                },
                "_seq_no": 34502802,
                "_primary_term": 2,
                "status": 201
            }
        }
    ]
}

And,

GET _enrich/_stats

{
  "executing_policies" : [ ],
  "coordinator_stats" : [
    {
      "node_id" : "QKDOhr5LQ2q3LuUr_3pGDQ",
      "queue_size" : 0,
      "remote_requests_current" : 0,
      "remote_requests_total" : 8,
      "executed_searches_total" : 8
    },
    {
      "node_id" : "mcEo_QXMTbiYporFgWgYgQ",
      "queue_size" : 0,
      "remote_requests_current" : 0,
      "remote_requests_total" : 37854355,
      "executed_searches_total" : 81101195
    }
  ],
  "cache_stats" : [
    {
      "node_id" : "QKDOhr5LQ2q3LuUr_3pGDQ",
      "count" : 8,
      "hits" : 0,
      "misses" : 8,
      "evictions" : 0
    },
    {
      "node_id" : "mcEo_QXMTbiYporFgWgYgQ",
      "count" : 1500,
      "hits" : 228851590,
      "misses" : 82027734,
      "evictions" : 53853368
    }
  ]
}

the question is, how should I call elasticsearch so that the node where the shard is alocated executes the indexing tasks?

leandrojmp · April 25, 2023, 10:18pm

I don't think you can and I'm not sure why you would want that.

The data will be processed by the node that received the request, but it will be stored without any problem in the node that actually has the shard.

If you want to load distribute the processing between all your nodes, you would need to use a load balance or depending on what you are using to send the data, just configure the two nodes in the elasticsearch hosts.

FKarraz · April 26, 2023, 12:13pm

mmm sorry, but i don’t buy it, so what is the point of having more than one ingest node?

I'm not quite sure what you mean by "just configure the two nodes in the elasticsearch hosts".

Thanks.

leandrojmp · April 26, 2023, 12:40pm

I think you are mixing up how the node roles work in Elasticsearch.

Every node in the cluster can have one or more roles, the only required roles are data_content, data_hot (or just data to get all data roles) and master, all other roles are optional and required only for specific features.

The ingest role is required to run ingest pipelines but depending on the use case you can even have nodes that only have the ingest role, so they would not store data, just process it.

Executing an ingest pipeline and indexing the data are different things that can happen in different nodes.

In your case both of your nodes have all the roles, so both of your nodes can store data and execute ingest pipelines.

If the node that received the indexing request has the ingest role, then it will execute the ingest pipeline and after that it will index the data, if the shard is in a different node, it will route the indexing request to that node, this was explained on a previous answer.

If you are making requests to only one node, then only this node will be running your ingest pipeline, if you want bot nodes to execute the ingest pipeline to balance the load, then you need to balance the requests between them, this is what I mean when I said to configure the two nodes while making the requests.

For example, if you are using logstash you would put both nodes in the hosts setting of the logstash pipeline, same thing applies to filebeat.

If you are manually making the requests, then you will need to manually change the endpoint to balance the requests.

The reason to be able to have multiple ingest nodes depende on each use case, some people even need to have dedicated ingest nodes because of processing issues.

system · May 24, 2023, 12:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enrichment doesn't work sometimes Elasticsearch ingest-pipeline	3	415	July 18, 2023
Help with Application Testing Elasticsearch docker	2	6	November 1, 2024
Struggling to get elastic running on k8 Elasticsearch docker	4	791	April 27, 2019
Problems with indexes.fielddata.cache.size and high I/O on HDD Elasticsearch docker	1	368	December 8, 2021
Enrich Processor is slow on multi nodes Elasticsearch ingest-pipeline	10	1096	January 6, 2021

New elasticsearch node does not run enrichments

Related topics