ESRally issue when running EventData track

There were some network changes and the Rally process stopped in between and the state of ES was in “yellow/red”. Hence I deleted to index from the ES cluster.

Now, when I run the below Rally command

esrally race --kill-running-processes --track=eventdata  --target-hosts=172.29.183.11:9200,172.29.183.12:9200,172.29.183.13:9200 --pipeline=benchmark-only --track-repository=eventdata --track-params="number_of_shards:3,number_of_replicas:1,bulk_size:10000,bulk_indexing_clients:15,daily_logging_volume:100GB,starting_point:2022-01-01,number_of_days:3,bulk_indexing_reqs_per_sec:15,search_clients:10" --challenge=index-fixed-load-and-query”  

I get the error

Running check-cluster-health                                                   [100% done]
Running fieldstats                                                             [  0% done]
[ERROR] Cannot race. Error in load generator [0]
        Cannot run task [fieldstats]: No matching data found for field '@timestamp' in pattern 'elasticlogs-*'.

Is there a way in Rally to force the data to download and re-index so the test runs successfully.

Thanks
Kailas

Hi @Kailas,

Thanks for using Rally! Please note from the eventdata repository (here) for the challenge you're executing:

Requires executing index-logs-fixed-daily-volume first.

Please give that a shot to initialize the index templates and index data prior to the execution of the index-fixed-load-and-query challenge.

Thanks for the solution. However, a quick question when the load-and-query reaches to 89-90% I always get

Running index-fixed-throughput,content_issues-dashboard-25%,discover-30...     [ 89% done]
[WARNING] Could not terminate all internal processes within timeout. Please check and force-terminate all Rally processes.
[ERROR] Cannot race. Error in load generator [0]
       Cannot run task [index-fixed-throughput]: Request returned an error. Error type: transport, Description: Cannot connect to host 172.29.183.13:9200 ssl:default [Connect call failed ('172.29.183.13', 9200)] (Cannot connect to host 172.29.183.13:9200 ssl:default [Connect call failed ('172.29.183.13', 9200)])


Any solution for this one ?

Thanks
Kailas

Please take a look at ~/.rally/logs/rally.log and also at your Elasticsearch logs and whether Elasticsearch is still reachable. I suspect that the high load is affecting Elasticsearch's performance.

Note that the default timeout value for the client that Rally uses is 60s. See the docs on how to increase it via --client-options.

If you haven't already read it, please read this section of the docs about latency vs service_time that should help you analyze results.

Hi,
Thanks for the suggesstion. I increased the timeout to 120 but still the same issue persists.
From the Rally logs

 2022-03-29 06:59:35,339 -not-actor-/PID:18593 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:1.234s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer
2022-03-29 06:59:35,341 -not-actor-/PID:18592 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:2.862s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
2022-03-29 06:59:35,341 -not-actor-/PID:18596 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:2.944s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
2022-03-29 06:59:35,341 -not-actor-/PID:18595 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:2.869s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
2022-03-29 06:59:35,341 -not-actor-/PID:18594 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:3.001s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
2022-03-29 06:59:35,345 -not-actor-/PID:18593 elasticsearch WARNING Connection <AIOHttpConnection: http://172.29.183.13:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 06:59:35,347 -not-actor-/PID:18592 elasticsearch WARNING Connection <AIOHttpConnection: http://172.29.183.13:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 06:59:35,349 -not-actor-/PID:18594 elasticsearch WARNING Connection <AIOHttpConnection: http://172.29.183.13:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 06:59:35,350 -not-actor-/PID:18596 elasticsearch WARNING Connection <AIOHttpConnection: http://172.29.183.13:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 06:59:35,352 -not-actor-/PID:18595 elasticsearch WARNING Connection <AIOHttpConnection: http://172.29.183.13:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 06:59:35,352 -not-actor-/PID:18593 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:2.650s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
2022-03-29 06:59:35,352 -not-actor-/PID:18592 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:1.735s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ClientOSError: [Errno 104] Connection reset by peer
2022-03-29 06:59:35,354 -not-actor-/PID:18594 elasticsearch WARNING POST http://172.29.183.13:9200/_bulk [status:N/A request:2.491s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
2022-03-29 07:00:01,100 -not-actor-/PID:18546 elasticsearch WARNING GET http://172.29.183.13:9200/_cluster/state/master_node [status:N/A request:0.001s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/connection/http_urllib3.py", line 251, in perform_request
    response = self.pool.urlopen(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/connectionpool.py", line 785, in urlopen
    retries = retries.increment(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/util/retry.py", line 525, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 1007, in _send_output
    self.send(msg)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/http/client.py", line 947, in send
    self.connect()
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fae1fde6af0>: Failed to establish a new connection: [Errno 111] Connection refused
2022-03-29 07:00:01,100 -not-actor-/PID:18546 elasticsearch WARNING Connection <Urllib3HttpConnection: http://172.29.183.13:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 07:00:02,914 -not-actor-/PID:18600 elasticsearch WARNING POST http://172.29.183.11:9200/_msearch?pre_filter_shard_size=1 [status:N/A request:0.001s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 986, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 1025, in create_connection
    raise exceptions[0]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 1010, in create_connection
    sock = await self._connect_sock(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 924, in _connect_sock
    await self.sock_connect(sock, address)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/selector_events.py", line 496, in sock_connect
    return await fut
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/selector_events.py", line 528, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.29.183.11', 9200)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 535, in _request
    conn = await self._connector.connect(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 542, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 907, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
    raise last_exc
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 172.29.183.11:9200 ssl:default [Connect call failed ('172.29.183.11', 9200)]
2022-03-29 07:00:02,916 -not-actor-/PID:18600 elasticsearch WARNING Connection <AIOHttpConnection: http://172.29.183.11:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 07:00:14,752 -not-actor-/PID:18603 elasticsearch WARNING POST http://172.29.183.11:9200/_msearch?pre_filter_shard_size=1 [status:N/A request:0.001s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 986, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 1025, in create_connection
    raise exceptions[0]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 1010, in create_connection
    sock = await self._connect_sock(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 924, in _connect_sock
    await self.sock_connect(sock, address)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/selector_events.py", line 496, in sock_connect
    return await fut
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/selector_events.py", line 528, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.29.183.11', 9200)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/elasticsearch/_async/http_aiohttp.py", line 291, in perform_request
    async with self.session.request(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 1138, in __aenter__
    self._resp = await self._coro
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/client.py", line 535, in _request
    conn = await self._connector.connect(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 542, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 907, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
    raise last_exc
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 992, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 172.29.183.11:9200 ssl:default [Connect call failed ('172.29.183.11', 9200)]
2022-03-29 07:00:14,754 -not-actor-/PID:18603 elasticsearch WARNING Connection <AIOHttpConnection: http://172.29.183.11:9200> has failed for 1 times in a row, putting on 60 second timeout.
2022-03-29 07:00:14,755 -not-actor-/PID:18603 elasticsearch WARNING POST http://172.29.183.13:9200/_msearch?pre_filter_shard_size=1 [status:N/A request:0.001s]
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/site-packages/aiohttp/connector.py", line 986, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 1025, in create_connection
    raise exceptions[0]
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 1010, in create_connection
    sock = await self._connect_sock(
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/base_events.py", line 924, in _connect_sock
    await self.sock_connect(sock, address)
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/selector_events.py", line 496, in sock_connect
    return await fut
  File "/root/.pyenv/versions/3.8.10/lib/python3.8/asyncio/selector_events.py", line 528, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.29.183.13', 9200)

Thanks
Kailas

Ok, on looking into more details - The entire elastic cluster fails as it shows "outofmemory" error

Mar 29 22:31:24 e1-node1 systemd[1]: Starting Elasticsearch...
Mar 29 22:31:42 e1-node1 systemd[1]: Started Elasticsearch.
Mar 30 02:23:57 e1-node1 systemd-entrypoint[1353]: [13951.559s][warning][gc,alloc] elasticsearch[node-1][[elasticlogs-2022-01-03][1]: Lucene Merge Thread #809]: Retried w...61459 words
Mar 30 02:23:57 e1-node1 systemd-entrypoint[1353]: java.lang.OutOfMemoryError: Java heap space
Mar 30 02:23:57 e1-node1 systemd-entrypoint[1353]: Dumping heap to /var/lib/elasticsearch/java_pid1353.hprof ...
Mar 30 02:23:57 e1-node1 systemd-entrypoint[1353]: Unable to create /var/lib/elasticsearch/java_pid1353.hprof: File exists
Mar 30 02:23:57 e1-node1 systemd-entrypoint[1353]: Terminating due to java.lang.OutOfMemoryError: Java heap space
Mar 30 02:23:57 e1-node1 systemd[1]: elasticsearch.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Mar 30 02:23:57 e1-node1 systemd[1]: Unit elasticsearch.service entered failed state.
Mar 30 02:23:57 e1-node1 systemd[1]: elasticsearch.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulk [default_local]
        ... 39 more
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of coordinating operation [coordinating_and_primary_bytes=67823416, replica_bytes=40280217, all_bytes=108103633, coordinating_operation_bytes=101757, max_coordinating_and_primary_bytes=107374182]
        at org.elasticsearch.index.IndexingPressure.markCoordinatingOperationStarted(IndexingPressure.java:76) ~[elasticsearch-7.16.1.jar:7.16.1]
        at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:186) ~[elasticsearch-7.16.1.jar:7.16.1]
        at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:91) ~[elasticsearch-7.16.1.jar:7.16.1]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179) ~[elasticsearch-7.16.1.jar:7.16.1]
        ... 34 more
[root@e1-node1 elasticsearch]# tail -f es_cluster1.log
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulk [default_local]
        ... 39 more
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of coordinating operation [coordinating_and_primary_bytes=67823416, replica_bytes=40280217, all_bytes=108103633, coordinating_operation_bytes=101757, max_coordinating_and_primary_bytes=107374182]
        at org.elasticsearch.index.IndexingPressure.markCoordinatingOperationStarted(IndexingPressure.java:76) ~[elasticsearch-7.16.1.jar:7.16.1]
        at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:186) ~[elasticsearch-7.16.1.jar:7.16.1]
        at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:91) ~[elasticsearch-7.16.1.jar:7.16.1]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179) ~[elasticsearch-7.16.1.jar:7.16.1]
        ... 34 more

I have data nodes with RAM of 64 GB. I haven't changed any settings. What should I do in this case ?

Thanks
Kailas

Is this the first time you are running this workload (seems to based on the first comment?

If yes, and given that Elasticsearch runs out of heap memory, you either need to reduce the load that benchmark generates and/or validate that your heap settings are correct.

But more importantly, I think you should consider what are you actually trying to achieve; what is the purpose of your benchmark? This will guide you in selecting the right parameters. I strongly recommend that you watch Daniel Mitterdorfer's presentation on the 7 deadly sins of benchmarking, slides here.

1 Like

Many thanks, changing the heap to 30g worked.

Regards
Kailas

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.