Rally esrally.mechanic.mechanic.NodeMechanicActor transport run exception

ramzis · June 29, 2022, 3:12pm

Hi, trying to run

esrally race --distribution-version=8.3.0 --kill-running-processes --track=geonames

hangs on

~/.rally/logs/rally.log:
ActorAddr-(T|:50500)/PID:8353 esrally.actor INFO Starting node(s) [0] on [127.0.0.1].

~/.rally/logs/actor-system-internal.log

2022-06-29 16:59:42.803152 p8313 I    ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)
2022-06-29 16:59:42.823157 p8313 I    Pending Actor request received for esrally.racecontrol.BenchmarkActor reqs {'coordinator': True} from ActorAddr-(T|:50462)
2022-06-29 16:59:42.830290 p8315 I    Starting Actor esrally.racecontrol.BenchmarkActor at ActorAddr-(T|:50464) (parent ActorAddr-(T|:1900), admin ActorAddr-(T|:1900), srcHash None)
2022-06-29 16:59:44.649925 p8334 I    Starting Actor esrally.mechanic.mechanic.MechanicActor at ActorAddr-(T|:50481) (parent ActorAddr-(T|:50464), admin ActorAddr-(T|:1900), srcHash None)
2022-06-29 16:59:46.541367 p8352 I    Starting Actor esrally.mechanic.mechanic.Dispatcher at ActorAddr-(T|:50497) (parent ActorAddr-(T|:50481), admin ActorAddr-(T|:1900), srcHash None)
2022-06-29 16:59:46.555366 p8353 I    Starting Actor esrally.mechanic.mechanic.NodeMechanicActor at ActorAddr-(T|:50500) (parent ActorAddr-(T|:50497), admin ActorAddr-(T|:1900), srcHash None)
2022-06-29 17:01:05.726480 p8353 ERR  Actor esrally.mechanic.mechanic.NodeMechanicActor @ ActorAddr-(T|:50500) transport run exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/thespian/system/actorManager.py", line 87, in run
    r = self.transport.run(self.handleMessages)
  File "/usr/local/lib/python3.9/site-packages/thespian/system/transport/wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "/usr/local/lib/python3.9/site-packages/thespian/system/transport/wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "/usr/local/lib/python3.9/site-packages/thespian/system/transport/TCPTransport.py", line 1219, in _runWithExpiry
    self._acceptNewIncoming()
  File "/usr/local/lib/python3.9/site-packages/thespian/system/transport/TCPTransport.py", line 1342, in _acceptNewIncoming
    lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
OSError: [Errno 22] Invalid argument

michaelbaamonde · June 29, 2022, 3:57pm

Hello and thanks for your interest in Rally!

I have two questions that will help us debug this:

What OS and version are you running?
What version of Rally are you using? Could you please run esrally --version and respond with the output?

Thanks!

ramzis · June 30, 2022, 10:06am

macOs Monterey 12.4
esrally 2.5.0

Quentin_Pradet · June 30, 2022, 12:47pm

Does this happen every time or is it sporadic?

ramzis · June 30, 2022, 1:17pm

Happens every time I run it. Just tried it again and the same error is shown.

RickBoyd · June 30, 2022, 2:12pm

I was able to reproduce, sort of.

When I created a fresh environment on my machine (also Monterey), it hung as you described (though I didn't get the actor system error you have). After I killed and re-ran, it hung again, and I realized the Elasticsearch install that was running from the prior race was still running.

This time I killed the relevant java process, then re-ran the command and it worked.

Does this align with you at all?

ramzis · June 30, 2022, 2:37pm

I removed any java processes and running again gives me

objc[60830]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[60830]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

followed by

Exception Type:        EXC_CRASH (SIGKILL)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Termination Reason:    Namespace OBJC, Code 1 

Application Specific Information:
crashed on child side of fork pre-exec


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	    0x7ff813f9d6da __terminate_with_payload + 10
1   libsystem_kernel.dylib        	    0x7ff813fba89e abort_with_payload_wrapper_internal + 119
2   libsystem_kernel.dylib        	    0x7ff813fba827 abort_with_reason + 19
3   libobjc.A.dylib               	    0x7ff813e85aee _objc_fatalv(unsigned long long, unsigned long long, char const*, __va_list_tag*) + 114
4   libobjc.A.dylib               	    0x7ff813e85a7c _objc_fatal(char const*, ...) + 138
5   libobjc.A.dylib               	    0x7ff813e7bfa0 performForkChildInitialize(objc_class*, objc_class*) + 299
6   libobjc.A.dylib               	    0x7ff813e67177 initializeNonMetaClass + 617
7   libobjc.A.dylib               	    0x7ff813e66c18 initializeAndMaybeRelock(objc_class*, objc_object*, mutex_tt<false>&, bool) + 232
8   libobjc.A.dylib               	    0x7ff813e66995 lookUpImpOrForward + 1087
9   libobjc.A.dylib               	    0x7ff813e65f9b _objc_msgSend_uncached + 75
10  CoreFoundation                	    0x7ff81403a68f -[__NSDictionaryM __setObject:forKey:] + 497
11  IOKit                         	    0x7ff816984284 MakeOneStringProp + 99
12  _psutil_osx.cpython-39-darwin.so	       0x10a641b14 psutil_disk_io_counters + 68
13  Python                        	       0x1098cdad3 cfunction_call + 90

<and so on>

RickBoyd · June 30, 2022, 3:53pm

Yeah... I've seen that before. This is a problem with python, not Rally, unfortunately. See this article: Multiprocessing causes Python to crash and gives an error may have been in progress in another thread when fork() was called – Python

ramzis · July 1, 2022, 3:34pm

Thanks, the link helped! Ran my first race successfully after exporting

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

system · July 29, 2022, 3:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Esrally failing with [ERROR] Cannot race. Error in race control (NotFoundError(404, 'Not Found', 'Not Found')) Elasticsearch rally	4	729	September 14, 2021
[ERROR] Cannot race. Fatal track or load generator indication Child Aborted Elasticsearch rally	2	1465	September 11, 2018
Benchmarking Remote Cluster Stalls in Preparing Elasticsearch rally	7	1388	November 18, 2019
[Error] Unable to run esrally benchmark in local Elasticsearch rally	2	143	May 23, 2024
Esrally will hang if I use elasticsearch 8.0.0-alpha1 Elasticsearch rally	12	567	December 7, 2021

Rally esrally.mechanic.mechanic.NodeMechanicActor transport run exception

Related topics