Following the cluster benchmarking as per the Rally 1.4 documentation.I have three Virtual machines
Coordinator is 10.156.129.118 (vm-1)
Target are 10.156.129.121 (vm-2) and 10.156.129.169 (vm-4)
On Coordinator:
elastic@elastic-vm-1 tmp]$ esrallyd start --node-ip=10.156.129.118 --coordinator-ip=10.156.129.118
[INFO] Successfully started actor system on node [10.156.129.118] with coordinator node IP [10.156.129.118]
On Targets:
[elastic@elastic-vm-4 ~]$ esrallyd start --node-ip=10.156.129.169 --coordinator-ip=10.156.129.118
[INFO] Successfully started actor system on node [10.156.129.169] with coordinator node IP [10.156.129.118].
[elastic@elastic-vm-2 ~]$ esrallyd start --node-ip=10.156.129.121 --coordinator-ip=10.156.129.118
[INFO] Successfully started actor system on node [10.156.129.121] with coordinator node IP [10.156.129.118].
Then create cluster and run benchmark:It immediately proceeds for preparing to race but stuck there forever..
[elastic@elastic-vm-1 tmp]$ esrally --distribution-version=7.5.2 --target-hosts=10.40.195.121:9200,10.156.129.169:9200
[INFO] Preparing for race ...
rally.log:
2020-03-12 00:27:57,440 ActorAddr-(T|:1900)/PID:1697 esrally.actor INFO Capabilities [{'Convention Address.IPv4': '10.156.129.118:1900', 'Thespian Generation': (3, 9), 'coordinator': True, 'Python Version': (3, 5, 1, 'final', 0), 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Thespian Version': '1583972540563', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'ip': '10.156.129.118'}] match requirements [{'coordinator': True}].
2020-03-12 00:27:58,249 ActorAddr-(T|:35236)/PID:1752 esrally.utils.repo INFO Checking out [7] in [/home/elastic/.rally/benchmarks/tracks/default] for distribution version [7.5.2].
2020-03-12 00:27:58,260 ActorAddr-(T|:35236)/PID:1752 esrally.utils.process INFO Already on '7'
Your branch is up-to-date with 'origin/7'.
2020-03-12 00:27:58,260 ActorAddr-(T|:35236)/PID:1752 esrally.utils.repo INFO Rebasing on [7] in [/home/elastic/.rally/benchmarks/tracks/default] for distribution version [7.5.2].
2020-03-12 00:27:58,269 ActorAddr-(T|:35236)/PID:1752 esrally.utils.process INFO Already on '7'
Your branch is up-to-date with 'origin/7'.
2020-03-12 00:27:58,314 ActorAddr-(T|:35236)/PID:1752 esrally.utils.process INFO Current branch 7 is up to date.
2020-03-12 00:27:58,319 ActorAddr-(T|:35236)/PID:1752 esrally.track.loader INFO Reading track specification file [/home/elastic/.rally/benchmarks/tracks/default/geonames/track.json].
2020-03-12 00:27:58,353 ActorAddr-(T|:35236)/PID:1752 esrally.track.loader INFO Final rendered track for '/home/elastic/.rally/benchmarks/tracks/default/geonames/track.json' has been written to '/tmp/tmpl3yggaos.json'.
2020-03-12 00:27:58,361 ActorAddr-(T|:35236)/PID:1752 esrally.track.loader INFO Loading template [definition for index geonames in index.json].
2020-03-12 00:27:58,367 ActorAddr-(T|:35236)/PID:1752 esrally.metrics INFO Creating in-memory metrics store
2020-03-12 00:27:58,367 ActorAddr-(T|:35236)/PID:1752 esrally.metrics INFO Opening metrics store for race timestamp=[20200312T002756Z], track=[geonames], challenge=[append-no-conflicts], car=[['defaults']]
2020-03-12 00:27:58,368 ActorAddr-(T|:35236)/PID:1752 esrally.metrics INFO Creating file race store
2020-03-12 00:27:58,368 ActorAddr-(T|:35236)/PID:1752 esrally.actor INFO Asking mechanic to start the engine.
2020-03-12 00:27:58,368 ActorAddr-(T|:35236)/PID:1752 esrally.actor INFO Capabilities [{'Convention Address.IPv4': '10.156.129.118:1900', 'Thespian Generation': (3, 9), 'coordinator': True, 'Python Version': (3, 5, 1, 'final', 0), 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Thespian Version': '1583972540563', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'ip': '10.156.129.118'}] match requirements [{'coordinator': True}].
2020-03-12 00:27:58,378 ActorAddr-(T|:32810)/PID:1816 esrally.actor INFO Received signal from race control to start engine.
2020-03-12 00:27:58,962 ActorAddr-(T|:32810)/PID:1816 esrally.utils.repo INFO Checking out [7] in [/home/elastic/.rally/benchmarks/teams/default] for distribution version [7.5.2].
2020-03-12 00:27:58,973 ActorAddr-(T|:32810)/PID:1816 esrally.utils.process INFO Already on '7'
Your branch is up-to-date with 'origin/7'.
2020-03-12 00:27:58,973 ActorAddr-(T|:32810)/PID:1816 esrally.utils.repo INFO Rebasing on [7] in [/home/elastic/.rally/benchmarks/teams/default] for distribution version [7.5.2].
2020-03-12 00:27:58,984 ActorAddr-(T|:32810)/PID:1816 esrally.utils.process INFO Already on '7'
Your branch is up-to-date with 'origin/7'.
2020-03-12 00:27:59,21 ActorAddr-(T|:32810)/PID:1816 esrally.utils.process INFO Current branch 7 is up to date.
2020-03-12 00:27:59,30 ActorAddr-(T|:32810)/PID:1816 esrally.actor INFO Cluster consisting of [{'host': '10.40.195.121', 'port': 9200}, {'host': '10.156.129.169', 'port': 9200}] will be provisioned by Rally.
2020-03-12 00:27:59,31 ActorAddr-(T|:32810)/PID:1816 esrally.actor INFO Capabilities [{'Convention Address.IPv4': '10.156.129.118:1900', 'Thespian Generation': (3, 9), 'coordinator': True, 'Python Version': (3, 5, 1, 'final', 0), 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Thespian Version': '1583972540563', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'ip': '10.156.129.118'}] match requirements [{}].
2020-03-12 00:27:59,41 ActorAddr-(T|:41796)/PID:1880 esrally.actor INFO Remote Rally node [10.156.129.169] has started.
2020-03-12 00:27:59,43 ActorAddr-(T|:1900)/PID:1697 esrally.actor INFO Checking capabilities [{'Convention Address.IPv4': '10.156.129.118:1900', 'Thespian Generation': (3, 9), 'coordinator': True, 'Python Version': (3, 5, 1, 'final', 0), 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Thespian Version': '1583972540563', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'ip': '10.156.129.118'}] against requirements [{'ip': '10.156.129.169'}] failed.
2020-03-12 00:27:59,41 ActorAddr-(T|:41796)/PID:1880 esrally.actor INFO Checking capabilities [{'Convention Address.IPv4': '10.156.129.118:1900', 'Thespian Generation': (3, 9), 'coordinator': True, 'Python Version': (3, 5, 1, 'final', 0), 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Thespian Version': '1583972540563', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'ip': '10.156.129.118'}] against requirements [{'ip': '10.156.129.169'}] failed.
2020-03-12 00:27:59,43 ActorAddr-(T|:1900)/PID:1697 esrally.actor INFO Capabilities [{'Convention Address.IPv4': '10.156.129.118:1900', 'Thespian Generation': (3, 9), 'ip': '10.156.129.169', 'coordinator': False, 'Thespian Version': '1583972574587', 'Thespian ActorSystem Version': 2, 'Thespian Watch Supported': True, 'Thespian ActorSystem Name': 'multiprocTCPBase', 'Python Version': (3, 5, 1, 'final', 0)}] match requirements [{'ip': '10.156.129.169'}].
-
This was my second attempt to do this esrallyd start and benchmark test. I am not seeing one of the target 10.156.129.121 in "|Convention Attendees [1]:" even though esrallyd command was successful. See below thespial shell status output.
I attempted to stop and start esrallyd across these servers but still see the same issue. Why is this happening ? -
In my first attempt of esrallyd start even though both targets (10.156.129.121 and 10.156.129.169) was showing under "Convention Attendees" still the Benchmark was stalled with similar message ("esrally.actor INFO Remote Rally node [10.156.129.121] has started ")in rally.log . ?
On the Linux servers the firewall is disabled so no port block issue and internet is accessible from all these VMs.
Any pointers to troubleshoot above 2 issues is appreciated.
[root@elastic-vm-1 ~]# tail -f /tmp/thespian.log
2020-03-11 17:22:36.816157 p1697 Warn Convention registration from ActorAddr-(T|:1900) is an invalid address; ignoring.
2020-03-11 17:32:43.176405 p1697 Warn Convention registration from ActorAddr-(T|:1900) is an invalid address; ignoring.
Thespian status
**********************
thespian> status
Requesting status from Actor (or Admin) @ ActorAddr-(T|:1900) (#0)
Status of ActorSystem @ ActorAddr-(T|:1900) [#0]:
|Capabilities[9]:
Thespian ActorSystem Version: 2
Python Version: (3, 5, 1, 'final', 0)
Thespian Generation: (3, 9)
ip: 10.156.129.118
Thespian Watch Supported: True
Thespian ActorSystem Name: multiprocTCPBase
Convention Address.IPv4: 10.156.129.118:1900
Thespian Version: 1583972540563
coordinator: True
|Convention Leader: ActorAddr-(T|:1900) [#0]
|Convention Notifications [1]:
ActorAddr-(T|:41796) [#2]
|Convention Attendees [1]:
@ ActorAddr-(T|10.156.129.169:1900) [#1]: Expires_in_0:20:21.828543
|Primary Actors [1]:
@ ActorAddr-(T|:35236) [#3]
|Rate Governer: Rate limit: 4480 messages/sec (currently low with 9 ticks)
|Pending Messages [0]:
|Received Messages [0]:
|Pending Wakeups [0]:
|Pending Address Resolution [0]:
|> 9 - Actor.Message Send.Transmit Started
|> 4 - Admin Handle Convention Registration
|> 15 - Admin Message Received.Total
|> 1 - Admin Message Received.Type.Pending Actor Request
|> 2 - Admin Message Received.Type.QueryExists
|> 3 - Admin Message Received.Type.StatusReq
|> sock#2-fd10 - Idle-socket <socket.socket fd=10, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('10.156.129.118', 1900), raddr=('10.156.129.118', 35194)>->ActorAddr-(T|:35878) (Expires_in_0:12:36.158960)
|> sock#3-fd11 - Idle-socket <socket.socket fd=11, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('10.156.129.118', 1900), raddr=('10.156.129.121', 52192)>->ActorAddr-(T|:1900) (Expires_in_0:17:20.289688)
|> sock#1-fd12 - Idle-socket <socket.socket fd=12, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('10.156.129.118', 1900), raddr=('10.156.129.169', 47848)>->ActorAddr-(T|10.156.129.169:1900) (Expires_in_0:18:15.830708)
|> sock#6-fd13 - Idle-socket <socket.socket fd=13, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('10.156.129.118', 1900), raddr=('10.156.129.118', 35208)>->ActorAddr-(T|:41962) (Expires_in_0:12:34.468386)
|> sock#4-fd15 - Idle-socket <socket.socket fd=15, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('10.156.129.118', 1900), raddr=('10.156.129.118', 35282)>->ActorAddr-(T|:41796) (Expires_in_0:12:36.156473)
|> sock#0-fd17 - Idle-socket <socket.socket fd=17, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('10.156.129.118', 1900), raddr=('10.156.129.118', 35218)>->ActorAddr-(T|:35236) (Expires_in_0:12:34.560533)
|> sock#5-fd18 - Idle-socket <socket.socket fd=18, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('127.0.0.1', 1900), raddr=('127.0.0.1', 39644)>->ActorAddr-(T|:37030) (Expires_in_0:19:59.999589)
|DeadLetter Addresses [0]:
|Source Authority: None
|Loaded Sources [0]:
|Global Actors [0]: