Elastic Search Random Node High Load

djovic · June 20, 2013, 11:35pm

Issue previously opened here:

Re-posted on the mailing list, as requested:

We have a problem with our Elastic Search cluster in every environment. The
cluster will sometimes get into an unstable state, wherein certain ES nodes
will have load that is several times greater than the load on other nodes.

This can be reproduced every time very quickly by hitting the cluster with
about 40 concurrent threads while running data indexing, and much less
quickly by simply running constant searches.

Our cluster setup:

4 no-data client nodes that service search requests
1 no-data client node that sends data indexing requests.
10 data nodes that are called by the 5 for indexing and searching.

We have verified that all the boxes have:

the same hardware
CPU Intel Xeon X5550 (2.67GHz, 64-bit, 16 core)
96GB high-end physical RAM (64GB JVM heap, 20 GB mem baseline).
the same OS
Red Hat Enterprise Linux Server release 5.9 (Tikanga)
Linux version 2.6.18-348.1.1.el5 x86_64
the same JVM
Sun/Oracle 1.7.0_09 (64-bit)
the same ES version
Currently on 0.20.4
Problem existed since 0.19.8, when we started with ES
no other software running
the same number of shards
1 product shard per node
13M product documents
(product schema has thousands of fields).
1 entity shard per node
135M entity documents
(dozens of entity data types, each with several fields).
approximately the same amount of data
product index is 25GB on disk.
entity index is 12GB on disk.
All data loads into about 20GB baseline RAM.
the exact same logical configuration
{
product: {
settings: {
index.translog.flush_threshold_size: 500mb
index.refresh_interval: 30s
index.number_of_replicas: 1
index.translog.disable_flush: false
index.version.created: 190999
index.number_of_shards: 5
index.routing.allocation.total_shards_per_node: 1
index.translog.flush_threshold_period: 60m
index.translog.flush_threshold_ops: 5000
}
}
entity: {
settings: {
index.translog.flush_threshold_size: 500mb
index.refresh_interval: 30s
index.number_of_replicas: 1
index.translog.disable_flush: false
index.version.created: 190999
index.number_of_shards: 5
index.routing.allocation.total_shards_per_node: 1
index.translog.flush_threshold_period: 60m
index.translog.flush_threshold_ops: 5000
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elastic Search Random Node High Load Elasticsearch	2	491	September 16, 2013
Uneven search load on data nodes Elasticsearch	0	552	September 18, 2014
Elasticsearch (6.8) is causing server load Elasticsearch	4	842	May 23, 2019
ElasticSearch nodes fail at random Elasticsearch	0	316	March 21, 2014
Elasticsearch high load on some nodes Elasticsearch	1	370	July 3, 2021

Elastic Search Random Node High Load

Related topics