Last year paramedic reported thousands o searches per second (whereas our
regular load are in the hundreds range) this eventually led to an excessive
cpu load across the cluster (4 machines). Not more than a month later the
same thing happened. We updated to ES 0.90.12 at the time, thinking it
could have something to do with a "forever looping query" bug that was
fixed.
Since then (around 6 months) everything was fine. Yesterday the same thing
happened again (we're still in the same elastic search version). One
interesting thing we noted is that the increase in searches over time,
which we thought was due to more adoption of the cluster in the company,
was actually a product of that weird behavior. We were at 1000 searches per
second. Yesterday it suddenly spiked to 2000 and it required a cluster
restart. After the restart it dropped to 600 and stayed like that.
Is there some recommendation for restarting machines in the cluster from
time to time? Has anyone seen anything like this?
Last year paramedic reported thousands o searches per second (whereas our
regular load are in the hundreds range) this eventually led to an excessive
cpu load across the cluster (4 machines). Not more than a month later the
same thing happened. We updated to ES 0.90.12 at the time, thinking it
could have something to do with a "forever looping query" bug that was
fixed.
Since then (around 6 months) everything was fine. Yesterday the same thing
happened again (we're still in the same Elasticsearch version). One
interesting thing we noted is that the increase in searches over time,
which we thought was due to more adoption of the cluster in the company,
was actually a product of that weird behavior. We were at 1000 searches per
second. Yesterday it suddenly spiked to 2000 and it required a cluster
restart. After the restart it dropped to 600 and stayed like that.
Is there some recommendation for restarting machines in the cluster from
time to time? Has anyone seen anything like this?
At first I was using elaticsearch paramedic ( GitHub - karmi/elasticsearch-paramedic: A simple tool to inspect the state and statistics about ElasticSearch clusters) recently I used marvel.
Marvel was reporting a 2000 searches/s mark while the cluster was acting
up. After the restart, it now reports 600 searches/s. Looking at nginx logs
I see no change in rate before or after the restart. Maybe something other
than elasticsearch is acting up, but I have no clue what else could it be.
I do need to upgrade, but the breaking changes are making it hard for me to
keep moving
Last year paramedic reported thousands o searches per second (whereas our
regular load are in the hundreds range) this eventually led to an excessive
cpu load across the cluster (4 machines). Not more than a month later the
same thing happened. We updated to ES 0.90.12 at the time, thinking it
could have something to do with a "forever looping query" bug that was
fixed.
Since then (around 6 months) everything was fine. Yesterday the same
thing happened again (we're still in the same Elasticsearch version). One
interesting thing we noted is that the increase in searches over time,
which we thought was due to more adoption of the cluster in the company,
was actually a product of that weird behavior. We were at 1000 searches per
second. Yesterday it suddenly spiked to 2000 and it required a cluster
restart. After the restart it dropped to 600 and stayed like that.
Is there some recommendation for restarting machines in the cluster from
time to time? Has anyone seen anything like this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.