Question: How to gauge/improve performance

Hey All,
We now have our indexes up and running, and collecting our content into
them and now it is time to start tuning performance. We collect articles
from around the web and store them in ES. we probably do around 200-300K
documents a day right now for testing. We use a simple standard analyzer
for title/content.
We shard per month, and we have 1 replica for that shard. For testing, we
have 4 machines, each has 8 gigs of RAM, es has 4 gigs alotted to it.

when I do a simple content search, it takes around 3-4 seconds to come
back. If you then do another search, another, etc. it still takes around
2-3 seconds. Of course, we want this in the milliseconds.
We have the default refresh rate still, which is 1 second (is this what is
affecting our speed?) because we want to see the content as fast as
possible.

I have included a basic query at the bottom of this post. What I am
looking for is
1 - How do we start debugging the performance bottleneck? I have bigdesk,
and the cpus do not seem active at all really. everything seems very
minimal, like 2-3% cpu usage
2 - Is the refresh rate the cause of this? What may be more optimal?
3 - we are not even sorting by date yet, but I would assume that as soon as
we do, performance will get even worse. So, any way to also start thinking
about sort performance?

simple query we run:
{
"fields":["docId","title","url","imageUrl"],
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "content",
"query": "la lakers"
}
}
],
"must_not": [ ],
"should": [ ]
}
},
"from": 0,
"size": 50,
"sort": [ ],
"facets": { }
}

Hi Scott,

On Wednesday, May 9, 2012 10:50:40 AM UTC-4, Scott Decker wrote:

Hey All,
We now have our indexes up and running, and collecting our content into
them and now it is time to start tuning performance. We collect articles
from around the web and store them in ES. we probably do around 200-300K
documents a day right now for testing. We use a simple standard analyzer
for title/content.
We shard per month, and we have 1 replica for that shard. For testing, we
have 4 machines, each has 8 gigs of RAM, es has 4 gigs alotted to it.

Sounds reasonable.

when I do a simple content search, it takes around 3-4 seconds to come

back. If you then do another search, another, etc. it still takes around
2-3 seconds. Of course, we want this in the milliseconds.
We have the default refresh rate still, which is 1 second (is this what is
affecting our speed?) because we want to see the content as fast as
possible.

That sounds slow. Maybe your indices/shards are too big for your servers?
Yes, refresh rate has an effect on speed - we've seen this in performance
test over and over.

I have included a basic query at the bottom of this post. What I am
looking for is
1 - How do we start debugging the performance bottleneck? I have bigdesk,
and the cpus do not seem active at all really. everything seems very
minimal, like 2-3% cpu usage

Have a look at Sematext Monitoring | Infrastructure Monitoring Service . You may see additional
metrics there, not sure.

2 - Is the refresh rate the cause of this? What may be more optimal?

Could be. Try 30 seconds.

3 - we are not even sorting by date yet, but I would assume that as soon
as we do, performance will get even worse. So, any way to also start
thinking about sort performance?

What will you sort by? Time? Make sure you use the right types and time
precision/granularity.

simple query we run:
{
"fields":["docId","title","url","imageUrl"],
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "content",
"query": "la lakers"
}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": { }
}

Looks as simple as queries get. :slight_smile:
Things to check/try:

  • Check GC counts/times
  • Check disk IO while searching
  • Check what happens if indexing is off? Still slow?
  • Check CPU usage during query
  • ...

Otis

Hiring Elasticsearch Engineers World-Wide --

This sounds very slow indeed. We have 30 million documents in a very
simple 2-node cluster, 5 shards with 1 replica, with 2GB RAM allocated to
each node, and the queries with no sorting, 25 hits are coming back ~50ms.

Another thing to check is the _segments API, how many segments does your
index have? It's possible you have a fairly un-optimized index:

http://www.elasticsearch.org/guide/reference/api/admin-indices-segments.html

Can you print the segment info on a gist for review? A 'good' sign I think
is a single large segment (GB's etc) with a few smaller ones (KBs to maybe
small MB). If you have several relatively meaty segments this may explain
something I've seen before.

When we first bulk indexed 30 million, the queries were much slower until
we optimized:

http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html

We trimmed the # segments by using _optimize?max_num_segments=X. Where
X=1, you have a completely optimized index. Since you're in test and not
prod, you can probably go straight to max_num_segments=1 and just live with
the IO hit. Since the segment optimizing is heavy on IO shuffling the
segments and then updating replicas, it may be worth removing replicas (set

replicas to 0) while you optimize and then bring back your replicas, and

also perhaps disable snapshots until it's completed just to reduce the IO
storm until the optimize is completed.

cheers,

Paul Smith

thanks all.

So, segments was interesting. for an index, we have 28 segments, 42 search
segments
There were about 3 or 4 segments that were about 5gigs. The rest were
around 100mb, 120mb, etc.
then I tried a merge with the max segments of 1. I just said return right
away and don't wait. took about 10 minutes or so in the background.
then I had more segments! yey!
anyway, it still looks like a few larger ones at around 5gigs or so
https://raw.github.com/gist/2650158/d2bfc2ab47d992485c0697c5d45cbb7f6a5fdcdf/gistfile1.txt

I will keep digging into the segments and see.

Otis - yes, like the spm, as more of that is needed for these types of
systems. I will try and get your collectd setup installed and see if that
helps

On Wednesday, May 9, 2012 3:19:56 PM UTC-7, tallpsmith wrote:

This sounds very slow indeed. We have 30 million documents in a very
simple 2-node cluster, 5 shards with 1 replica, with 2GB RAM allocated to
each node, and the queries with no sorting, 25 hits are coming back ~50ms.

Another thing to check is the _segments API, how many segments does your
index have? It's possible you have a fairly un-optimized index:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Can you print the segment info on a gist for review? A 'good' sign I
think is a single large segment (GB's etc) with a few smaller ones (KBs to
maybe small MB). If you have several relatively meaty segments this may
explain something I've seen before.

When we first bulk indexed 30 million, the queries were much slower until
we optimized:

Elasticsearch Platform — Find real-time answers at scale | Elastic

We trimmed the # segments by using _optimize?max_num_segments=X. Where
X=1, you have a completely optimized index. Since you're in test and not
prod, you can probably go straight to max_num_segments=1 and just live with
the IO hit. Since the segment optimizing is heavy on IO shuffling the
segments and then updating replicas, it may be worth removing replicas (set

replicas to 0) while you optimize and then bring back your replicas, and

also perhaps disable snapshots until it's completed just to reduce the IO
storm until the optimize is completed.

cheers,

Paul Smith

On 10 May 2012 10:50, Scott Decker scott@publishthis.com wrote:

thanks all.

So, segments was interesting. for an index, we have 28 segments, 42 search
segments
There were about 3 or 4 segments that were about 5gigs. The rest were
around 100mb, 120mb, etc.
then I tried a merge with the max segments of 1. I just said return right
away and don't wait. took about 10 minutes or so in the background.
then I had more segments! yey!
anyway, it still looks like a few larger ones at around 5gigs or so

https://raw.github.com/gist/2650158/d2bfc2ab47d992485c0697c5d45cbb7f6a5fdcdf/gistfile1.txt

I will keep digging into the segments and see.

if you set max_num_segments to 1, and after that you STILL have shards with
a number of segments of ~5Gb in size, either the optimize hsan't actually
finished yet, or it didn't actually do it. I use elasticsearch-head and
let it wait so I know it's completed too.

the gist is after the _optimize?max_num_segments=1 POST call ? (it's not
a GET)

you shouldn't be expecting more segments, you should have less after an
optimize. If you are trickle indexing in the background still though, some
newer, very very tiny segments will come in and merge along, but there
should only be 1 single large segment per shard after an POST to
_optimize?max_num_segments=1.

On 10 May 2012 11:01, Paul Smith tallpsmith@gmail.com wrote:

On 10 May 2012 10:50, Scott Decker scott@publishthis.com wrote:

thanks all.

So, segments was interesting. for an index, we have 28 segments, 42
search segments
There were about 3 or 4 segments that were about 5gigs. The rest were
around 100mb, 120mb, etc.
then I tried a merge with the max segments of 1. I just said return right
away and don't wait. took about 10 minutes or so in the background.
then I had more segments! yey!
anyway, it still looks like a few larger ones at around 5gigs or so

https://raw.github.com/gist/2650158/d2bfc2ab47d992485c0697c5d45cbb7f6a5fdcdf/gistfile1.txt

I will keep digging into the segments and see.

if you set max_num_segments to 1, and after that you STILL have shards
with a number of segments of ~5Gb in size, either the optimize hsan't
actually finished yet, or it didn't actually do it. I use
elasticsearch-head and let it wait so I know it's completed too.

the gist is after the _optimize?max_num_segments=1 POST call ? (it's
not a GET)

this is the call I made
curl -XPOST ':9200/content-may-2012/_optimize?max_num_segments=1&wait_for_merge=false'

It did complete. The gist is from after the optimize was run.

in head, not sure if this matters, but I have something like 22.1GB and
then in parentheses it says (86.5GB)
I am guessing the parentheses is showing what 4 servers of the same index
is?
Number of documents is around 645K right now.
from head
docs: {

  • num_docs: 635308
  • max_doc: 642215
  • deleted_docs: 6907

}

index: {

  • primary_size: 22.1gb
  • primary_size_in_bytes: 23800200010
  • size: 86.5gb
  • size_in_bytes: 92984238959

}

Also, played around with refresh times. set it to 5 minutes, and then
started doing searches. I limit fields to just a few things like title,
url, pubdate. If I do 10 results, usually the results return in
17milliseconds, or sometimes, 3-4 seconds.

if I bump to 35-40 results, it is almost always 800-900 milliseconds or
more.

Currently, there are 2 indexes mapped to 1 alias. Each index is 1 shard 3
replicas. This spreads evenly across 4 servers. So, server should be ok.
The loads are really minimal on thse servers. So, it seems like it is a
lucene index issue I am running into, but not totally sure.

On Wednesday, May 9, 2012 6:03:23 PM UTC-7, tallpsmith wrote:

you shouldn't be expecting more segments, you should have less after an
optimize. If you are trickle indexing in the background still though, some
newer, very very tiny segments will come in and merge along, but there
should only be 1 single large segment per shard after an POST to
_optimize?max_num_segments=1.

So, segments was interesting. for an index, we have 28 segments, 42
search segments
There were about 3 or 4 segments that were about 5gigs. The rest were
around 100mb, 120mb, etc.
then I tried a merge with the max segments of 1. I just said return
right away and don't wait. took about 10 minutes or so in the background.
then I had more segments! yey!
anyway, it still looks like a few larger ones at around 5gigs or so

https://raw.github.com/gist/2650158/d2bfc2ab47d992485c0697c5d45cbb7f6a5fdcdf/gistfile1.txt

I will keep digging into the segments and see.

if you set max_num_segments to 1, and after that you STILL have shards
with a number of segments of ~5Gb in size, either the optimize hsan't
actually finished yet, or it didn't actually do it. I use
elasticsearch-head and let it wait so I know it's completed too.

the gist is after the _optimize?max_num_segments=1 POST call ? (it's
not a GET)

On 10 May 2012 11:25, Scott Decker scott@publishthis.com wrote:

this is the call I made
curl -XPOST ':9200/content-may-2012/_optimize?max_num_segments=1&wait_for_merge=false'

It did complete. The gist is from after the optimize was run.

in head, not sure if this matters, but I have something like 22.1GB and
then in parentheses it says (86.5GB)

I am guessing the parentheses is showing what 4 servers of the same index
is?

yes, this means you have 1 primary and 3 replica's configured I take it
then, because the brackets is the sum total of space for that index across
the cluster including the replicas, the 22Gb is for the total of the
primary shards.

Number of documents is around 645K right now.
from head
docs: {

  • num_docs: 635308
  • max_doc: 642215
  • deleted_docs: 6907

}

22Gb for 635k docs is roughly 37kb for each doc, so that's fairly heavy
payload and may be a contributing factor, I take it you are storing
_source, and that in your results you are asking for source as well ? your
example query above doesn't say it is loading the _source field, you have
an explicit set of fields so it shouldn't be retrieving that.

Currently, there are 2 indexes mapped to 1 alias. Each index is 1 shard 3

replicas. This spreads evenly across 4 servers. So, server should be ok.
The loads are really minimal on thse servers. So, it seems like it is a
lucene index issue I am running into, but not totally sure.

Can you do your slower query against just both of the indices behind the
aliases individually to see if it's the join of the results across the 2
indices that a primary cause of the slowdown? when hitting the alias the
search must go to each index and then the results merged through the
scoring, so just want to try to partition where the issue is.

If it's not the alias, then it's possible that one of the nodes itself is
performing worse than the others, you could further partition the problem
by setting your queries to force them to go only to specific nodes with
this api:
Elasticsearch Platform — Find real-time answers at scale | Elastic (if
you're ES version supports that, I can't exactly remember which version
that came out).

I still do not understand why after the optimize you still have quite a few
large segments, there should only be one. You're not getting any errors in
the ES node logs are you ? this is a loose thread I don't like, I've seen
searches degrade when segment distribution isn't optimal, particularly if
you have a short refresh interval and indexing in the background. There's
no sorting, you're getting very few hits, no _source retrieval... It
should just fly.

Given the largeness of your document payload you may end up benefitting
from compression (it's not on by default, but you'll have to reindex to
take advantage of that, I don't think an optimize will compress things
during the merge - can anyone confirm that? )

getting a handle on what CPU/DISK IOs each node is doing around this would
be interesting too, so if you have comprehensive server monitoring that
would be worth looking at (SPM et al). but the optimize not really looking
like it has really optimized seems wrong.

Paul

Just a quick one here. On our backdated index, tried doing the optimize but
just wait for it. It eventually came back and it ended up with 2 segments.
so, maybe the background optimize isn't really working.
I am now going to setup time to try it on our most recent index and see if
doing that actually puts the correct segments in there.

this is the call I made

curl -XPOST ':9200/content-may-2012/_optimize?max_num_segments=1&wait_for_merge=false'

It did complete. The gist is from after the optimize was run.

in head, not sure if this matters, but I have something like 22.1GB and
then in parentheses it says (86.5GB)

I am guessing the parentheses is showing what 4 servers of the same index
is?

yes, this means you have 1 primary and 3 replica's configured I take it
then, because the brackets is the sum total of space for that index across
the cluster including the replicas, the 22Gb is for the total of the
primary shards.

Number of documents is around 645K right now.
from head
docs: {

  • num_docs: 635308
  • max_doc: 642215
  • deleted_docs: 6907

}

22Gb for 635k docs is roughly 37kb for each doc, so that's fairly heavy
payload and may be a contributing factor, I take it you are storing
_source, and that in your results you are asking for source as well ? your
example query above doesn't say it is loading the _source field, you have
an explicit set of fields so it shouldn't be retrieving that.

Currently, there are 2 indexes mapped to 1 alias. Each index is 1 shard 3

replicas. This spreads evenly across 4 servers. So, server should be ok.
The loads are really minimal on thse servers. So, it seems like it is a
lucene index issue I am running into, but not totally sure.

Can you do your slower query against just both of the indices behind the
aliases individually to see if it's the join of the results across the 2
indices that a primary cause of the slowdown? when hitting the alias the
search must go to each index and then the results merged through the
scoring, so just want to try to partition where the issue is.

If it's not the alias, then it's possible that one of the nodes itself is
performing worse than the others, you could further partition the problem
by setting your queries to force them to go only to specific nodes with
this api:
Elasticsearch Platform — Find real-time answers at scale | Elastic(if you're ES version supports that, I can't exactly remember which version
that came out).

I still do not understand why after the optimize you still have quite a
few large segments, there should only be one. You're not getting any
errors in the ES node logs are you ? this is a loose thread I don't like,
I've seen searches degrade when segment distribution isn't optimal,
particularly if you have a short refresh interval and indexing in the
background. There's no sorting, you're getting very few hits, no _source
retrieval... It should just fly.

Given the largeness of your document payload you may end up benefitting
from compression (it's not on by default, but you'll have to reindex to
take advantage of that, I don't think an optimize will compress things
during the merge - can anyone confirm that? )

getting a handle on what CPU/DISK IOs each node is doing around this would
be interesting too, so if you have comprehensive server monitoring that
would be worth looking at (SPM et al). but the optimize not really looking
like it has really optimized seems wrong.

Paul

On 11 May 2012 02:24, Scott Decker scott@publishthis.com wrote:

Just a quick one here. On our backdated index, tried doing the optimize
but just wait for it. It eventually came back and it ended up with 2
segments.
so, maybe the background optimize isn't really working.
I am now going to setup time to try it on our most recent index and see if
doing that actually puts the correct segments in there.

how big are the 2 segments? I'm hoping it's one giant one, plus a very
tiny one. ? Did the query time improve on this index after the optimize?

thanks for letting us know how you're going.

Paul

So, some fun reports.
After optimizing the first index, the speed .. well, not much improvement.
The index was now one big segment and one small one.
However, that index was past being updated, it was for april.
So, tried optimizing the May one. It took awhile, but finished and is then
trying to get the segments over to the replicas. Now into about 5 hours
now. Short aside on that, looks like doing optimize in the future pull the
replicas in so we have only the shard and 1 replica, do an optimize, and
then move the replicas back out.
Its taking forever to get to the replicas.

However, the search speeds are much faster! So, down to 14-15 milliseconds
returning 50 results. even sorting by pubdate. every once in awhile it will
hit 200-300 milliseconds.

Now, the fun. I do the segments call and we are at around 23-24 segments on
the may index. One segment is around 13gigs, and then a 5 gig one and then
a bunch of smaller ones.
Seems like it is time to play with the merge policy. (seem to be moving out
of es realm and into Lucene realm!)

maybe max merge segment would do well at like 30-40 gigs, since that would
cover a good portion of time.
indexing is still fast, even at a 13 gig segment size.

Maybe the way we have broken up the shards is off as well. we want to have
a years worth of content to search against, and we were planning on doing a
index a month.
How does that sound? With that strategy, do the max merge segment of 30-40
gigs, and we should have large indexes to be able to search against.

wrote:

Just a quick one here. On our backdated index, tried doing the optimize

but just wait for it. It eventually came back and it ended up with 2
segments.
so, maybe the background optimize isn't really working.
I am now going to setup time to try it on our most recent index and see
if doing that actually puts the correct segments in there.

how big are the 2 segments? I'm hoping it's one giant one, plus a very
tiny one. ? Did the query time improve on this index after the optimize?

thanks for letting us know how you're going.

Paul

On 11 May 2012 10:15, Scott Decker scott@publishthis.com wrote:

So, some fun reports.
After optimizing the first index, the speed .. well, not much improvement.
The index was now one big segment and one small one.
However, that index was past being updated, it was for april.

this is very odd that this index is not now faster.

So, tried optimizing the May one. It took awhile, but finished and is
then trying to get the segments over to the replicas. Now into about 5
hours now. Short aside on that, looks like doing optimize in the future
pull the replicas in so we have only the shard and 1 replica, do an
optimize, and then move the replicas back out.
Its taking forever to get to the replicas.

Yes, if you can get away with setting the replicas to 0 even better, but
obviously not ideal.. :slight_smile: You may also want to suspend the gateway
snapshots as well during this time because there's a lot of IO going on,
and the replication & snapshotting will be adding additional IOs, when you
probably just want the final result snapshotted.

However, the search speeds are much faster! So, down to 14-15 milliseconds
returning 50 results. even sorting by pubdate. every once in awhile it will
hit 200-300 milliseconds.

Now, the fun. I do the segments call and we are at around 23-24 segments
on the may index. One segment is around 13gigs, and then a 5 gig one and
then a bunch of smaller ones.
Seems like it is time to play with the merge policy. (seem to be moving
out of es realm and into Lucene realm!)

Elasticsearch Platform — Find real-time answers at scale | Elastic will
be your friend here. Basically the default tiered one is giving you good
even indexing/searching performance without a heavy Full GC-like hit of a
periodically large merge that the log_doc/log_byte_size ones do (these ones
are nice and cheap for a long time and then you get a very heavy hit).
With the tiered policy, you can schedule when you want to trim things
down.

It would be nice to have in ES a scheduler-merge-plugin that can monitor
each index segment api info and automatically trim them down, which is done
only during non-peak hours.

maybe max merge segment would do well at like 30-40 gigs, since that would
cover a good portion of time.
indexing is still fast, even at a 13 gig segment size.

Maybe the way we have broken up the shards is off as well. we want to have
a years worth of content to search against, and we were planning on doing a
index a month.
How does that sound? With that strategy, do the max merge segment of 30-40
gigs, and we should have large indexes to be able to search against.

Can you repost a summary of your cluster and index setup ? I've reviewed
your earlier messages, and IIUC, you have 4 nodes, with an index per month,
each index has how many shards? (there's confusion for me in you first post
because you say 'shard per month' which I interpret it as an 'index per
month'. You have 4 replicas' configured (or do you mean 1 primary, and 3
replicas' to balance over the 4 nodes?).

I wonder if you're over-allocating on replicas here, and maybe just having
2 or even only 1 replica may reduce the burden on the overall cluster. If
you used the default 5-shard in an index configuration, each of the nodes
will be a primary shard holder with one node taking 2 primarys. With so
many replicas the cluster may be overloading the distribution of updates
and cancelling out any search/read load balancing completely.

I'm by no means an expert on this, but I suspect having more than 1 replica
is designed for when you have more nodes than primary shards, have
something like, say 10 ES nodes, a 5 shard index, with 1 replica. So 5
nodes each hold a primary shard, and 5 other nodes hold a replica allowing
these replica-shard-holding nodes to be used as the search load balancers.

something else to consider anyway.

Paul

Actually maybe a picture will tell a thousand more words, could you screen
shot the elasticsearch-head view of the nodes & replicas setup and index
overview? (the main elasticsearch-head view).

https://lh4.googleusercontent.com/-lGBrDekvYCA/T60jNV0pTKI/AAAAAAAAAAM/ooF4a-vagRM/s1600/head-snapshot.jpg

Here is a pic for you.

What we are trying right now is 1 index per month. Index is 1 shard + 3
replicas.
Our write load is fairly constant. It is our read load that is going to
keep going up and up. That is why we were trying more replicas.

Our goal is to store around a years worth of content.
Would it be better to have 1 index, but lots of shards and a few replicas?
Or, is creating an index by month a good way to go, but we should have
something like 2-3 shards with 1 replica?

Also, looking into the mergepolicy more this weekend to see what may be
optimal in testing.

On Thursday, May 10, 2012 5:43:40 PM UTC-7, tallpsmith wrote:

Actually maybe a picture will tell a thousand more words, could you screen
shot the elasticsearch-head view of the nodes & replicas setup and index
overview? (the main elasticsearch-head view).

I think you have too few shards and too many replicas for this setup. I
would inverse what you have and go with 4 shards per index and 1 replica.

The main reason to shard is to distribute the writes and reads. You have a
single shard taking a lot of work and then 4 replicas to distribute to so
the single primary node (because there's only ever one primary shard in
your case) will be over worked.

You will notice that the primary shard of both indexes is on the same host
at the moment creating more work for it. (the thick black line around a
shard is the primary in elasticsearch-head)

Rule of thumb is to have # shards the same as your number of nodes or for
future planning for the # nodes you may need in the future.

You can't change the # shards once created so you will have to reindex.

I think you will get better performance this way.

Paul
On Saturday, 12 May 2012, Scott Decker wrote:

https://lh4.googleusercontent.com/-lGBrDekvYCA/T60jNV0pTKI/AAAAAAAAAAM/ooF4a-vagRM/s1600/head-snapshot.jpg

Here is a pic for you.

What we are trying right now is 1 index per month. Index is 1 shard + 3
replicas.
Our write load is fairly constant. It is our read load that is going to
keep going up and up. That is why we were trying more replicas.

Our goal is to store around a years worth of content.
Would it be better to have 1 index, but lots of shards and a few replicas?
Or, is creating an index by month a good way to go, but we should have
something like 2-3 shards with 1 replica?

Also, looking into the mergepolicy more this weekend to see what may be
optimal in testing.

On Thursday, May 10, 2012 5:43:40 PM UTC-7, tallpsmith wrote:

Actually maybe a picture will tell a thousand more words, could you
screen shot the elasticsearch-head view of the nodes & replicas setup and
index overview? (the main elasticsearch-head view).

fyi to everyone. we will be updating our shards in the coming week or so,
so, after that is done, I will report back here with performance and other
items we are doing. I figure it is useful for when people search the groups
to see these sorts of conversations.

On Friday, May 11, 2012 3:03:27 PM UTC-7, tallpsmith wrote:

I think you have too few shards and too many replicas for this setup. I
would inverse what you have and go with 4 shards per index and 1 replica.

The main reason to shard is to distribute the writes and reads. You have a
single shard taking a lot of work and then 4 replicas to distribute to so
the single primary node (because there's only ever one primary shard in
your case) will be over worked.

You will notice that the primary shard of both indexes is on the same host
at the moment creating more work for it. (the thick black line around a
shard is the primary in elasticsearch-head)

Rule of thumb is to have # shards the same as your number of nodes or for
future planning for the # nodes you may need in the future.

You can't change the # shards once created so you will have to reindex.

I think you will get better performance this way.

Paul
On Saturday, 12 May 2012, Scott Decker wrote:

https://lh4.googleusercontent.com/-lGBrDekvYCA/T60jNV0pTKI/AAAAAAAAAAM/ooF4a-vagRM/s1600/head-snapshot.jpg

Here is a pic for you.

What we are trying right now is 1 index per month. Index is 1 shard + 3
replicas.
Our write load is fairly constant. It is our read load that is going to
keep going up and up. That is why we were trying more replicas.

Our goal is to store around a years worth of content.
Would it be better to have 1 index, but lots of shards and a few replicas?
Or, is creating an index by month a good way to go, but we should have
something like 2-3 shards with 1 replica?

Also, looking into the mergepolicy more this weekend to see what may be
optimal in testing.

On Thursday, May 10, 2012 5:43:40 PM UTC-7, tallpsmith wrote:

Actually maybe a picture will tell a thousand more words, could you
screen shot the elasticsearch-head view of the nodes & replicas setup and
index overview? (the main elasticsearch-head view).

On 17 May 2012 04:59, Scott Decker scott@publishthis.com wrote:

fyi to everyone. we will be updating our shards in the coming week or so,
so, after that is done, I will report back here with performance and other
items we are doing. I figure it is useful for when people search the groups
to see these sorts of conversations.

Thanks Scott, I'm definitely interested in seeing what results you get.