On 11 May 2012 10:15, Scott Decker scott@publishthis.com wrote:
So, some fun reports.
After optimizing the first index, the speed .. well, not much improvement.
The index was now one big segment and one small one.
However, that index was past being updated, it was for april.
this is very odd that this index is not now faster.
So, tried optimizing the May one. It took awhile, but finished and is
then trying to get the segments over to the replicas. Now into about 5
hours now. Short aside on that, looks like doing optimize in the future
pull the replicas in so we have only the shard and 1 replica, do an
optimize, and then move the replicas back out.
Its taking forever to get to the replicas.
Yes, if you can get away with setting the replicas to 0 even better, but
obviously not ideal.. You may also want to suspend the gateway
snapshots as well during this time because there's a lot of IO going on,
and the replication & snapshotting will be adding additional IOs, when you
probably just want the final result snapshotted.
However, the search speeds are much faster! So, down to 14-15 milliseconds
returning 50 results. even sorting by pubdate. every once in awhile it will
hit 200-300 milliseconds.
Now, the fun. I do the segments call and we are at around 23-24 segments
on the may index. One segment is around 13gigs, and then a 5 gig one and
then a bunch of smaller ones.
Seems like it is time to play with the merge policy. (seem to be moving
out of es realm and into Lucene realm!)
Elasticsearch Platform — Find real-time answers at scale | Elastic will
be your friend here. Basically the default tiered one is giving you good
even indexing/searching performance without a heavy Full GC-like hit of a
periodically large merge that the log_doc/log_byte_size ones do (these ones
are nice and cheap for a long time and then you get a very heavy hit).
With the tiered policy, you can schedule when you want to trim things
down.
It would be nice to have in ES a scheduler-merge-plugin that can monitor
each index segment api info and automatically trim them down, which is done
only during non-peak hours.
maybe max merge segment would do well at like 30-40 gigs, since that would
cover a good portion of time.
indexing is still fast, even at a 13 gig segment size.
Maybe the way we have broken up the shards is off as well. we want to have
a years worth of content to search against, and we were planning on doing a
index a month.
How does that sound? With that strategy, do the max merge segment of 30-40
gigs, and we should have large indexes to be able to search against.
Can you repost a summary of your cluster and index setup ? I've reviewed
your earlier messages, and IIUC, you have 4 nodes, with an index per month,
each index has how many shards? (there's confusion for me in you first post
because you say 'shard per month' which I interpret it as an 'index per
month'. You have 4 replicas' configured (or do you mean 1 primary, and 3
replicas' to balance over the 4 nodes?).
I wonder if you're over-allocating on replicas here, and maybe just having
2 or even only 1 replica may reduce the burden on the overall cluster. If
you used the default 5-shard in an index configuration, each of the nodes
will be a primary shard holder with one node taking 2 primarys. With so
many replicas the cluster may be overloading the distribution of updates
and cancelling out any search/read load balancing completely.
I'm by no means an expert on this, but I suspect having more than 1 replica
is designed for when you have more nodes than primary shards, have
something like, say 10 ES nodes, a 5 shard index, with 1 replica. So 5
nodes each hold a primary shard, and 5 other nodes hold a replica allowing
these replica-shard-holding nodes to be used as the search load balancers.
something else to consider anyway.
Paul