ES Index performance

Oren_Mazor · February 15, 2012, 3:05pm

I'll be re-setting up the machine in the nearby future, and bumping up
to the latest ES version (mine's 18.6).

I did some fiddling with the merge policy merge factor setting (up to
30 from the default) and it's provided some significant improvement in
my bulk calls. How high can I go on this before there are performance
issues, and what are they? specifically if I leave this setting at
30-40 permanently.

On Feb 12, 11:42 am, Shay Banon kim...@gmail.com wrote:

The bulk request returns a response per item if it succeeded or not (and if failed, the failure itself), so you need to check the actual response body. Also, can you try and use a newer Java version, the one you use is pretty old.

On Saturday, February 11, 2012 at 11:47 PM, Oren Mazor wrote:

Yup. the bulk operations are all okay, at least as far as the http
response is concerned.

I'm almost certain that my problem is just that we're hitting some
resource limit for the size of our index (40gb), but I cant figure out
where to find the blockage. I'm watching the stats on the cluster and
seeing nothing other than flat/healthy usage.

I am seeing a higher than normal read/write activity over the past 24
hours (huge number of documents added)

On Feb 9, 9:15 pm, <medcl2...@gmail.com (http://gmail.com)> wrote:

hi,OrenMazor
did you checked the response of the bulk operation,are they all successful
indexed?
and also check your translog status.
also manually refresh the index and to see if can get the current version

-----Original Message-----
From:OrenMazor
Sent: Friday, February 10, 2012 2:25 AM
To: elasticsearch
Subject: Re: ES Index performance

Issuing a Get does not return the correct version.

we're using OpenJDK java version 1.6.0_18, not on EC2, with debian. we
have 1 replica and 10 shards.

I'm suspecting an IO issue, to be honest, given that if there were
indexing performance issues somebody may have seen them already, what
with "select isn't broken"

that said, we do have a pretty high rate of indexes, so maybe at scale
some issue pops up ?

On Feb 7, 2:00 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Going back to your question, do you see that issuing a Get (which is
realtime) does not return the correct version of the data? I would be
helpful to understand where the stalling is coming from. If a "get" does
not return your expect version of the data, it means that it didn't get
indexed, so you will need to look at the indexer code and see if maybe
something is stalling on the bulk API execution.

The stalling can be for many reasons, starting with slow IO, not enough
resources on the machine CPU/Mem, overloading the machines you have in the
cluster, GC .

Which JVM version are you using? Are you running on EC2? If so, which
instances / os version? How many shards do you have in the index?

On Tuesday, February 7, 2012 at 5:04 PM,OrenMazor wrote:

hi Shay,

we use server density to keep track of ES and I'm not seeing any
spikes in resource use at all. I'm suspecting we're just pushing it
more than most people do?

I do use the bulk API to send perhaps 1000 insertions every 10 seconds
on average. in some cases these are new records, and sometimes they
are new versions of existing records. I also send some amount of
deletes every minute, but these are not really using the bulk api.

could you recommend a more low level primer on ES? I'd love to have a
more low level understanding of how/why things work. it'll make it
easier for me to tune my algorithms. I can go through the source, but
if there're some papers out there I could read, that'd be better

thanks!
Oren

PS. I noticed there is now a mongodb river in development. I'm
wondering whether my efforts might be better spent helping it to
production status rather than trying to tune my own code..

On Feb 5, 4:31 pm, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

On Saturday, February 4, 2012 at 3:18 AM,OrenMazor wrote:

so, I'm starting to see these again on heavy really heavy load (lets
say around 10k insertions a minute)

Whats the behavior of elasticsearch in this case? Memory usage ok?
When you say 10k inserts per minute, is that using the bulk API? How
many clients are indexing the data?

I'm still having some difficulty wrapping my head around the
algorithm
in the bottom end. the refresh total_time is 17h and merges is
14.9h.
this seems pretty ambiguous. I'm guessing its the total time spent
executing these actions rather than the time since, right?

Yes, thats the total time that was spent doing it.

are there some hardware settings I can make to make lucene go
faster?
also, is there anything I can read to level up on understanding the
low level side of things? I'm going through the ES code to start
with
and learning more there.

On Jan 25, 9:37 am, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

Great, thanks for the update!

On Wednesday, January 25, 2012 at 8:54 AM,OrenMazor wrote:

Hi Shay,

just a follow up (because I hate it when there is no closure).

I modified my import script to use bulk imports, so instead of
10
insertions a second, I now end up doing one bulk insertion every
ten
seconds. I had it up to a minute, but I think inserting 600-800
records in one bulk request was causing some problems, so I
shortened
the frequency.

so far I'm not seeeing any serious delays in testing this week,
but
tomorrow I'll do some bigger load testing with our big index. it
seems
promising at the moment!

On Jan 20, 2:26 pm, Shay Banon <kim...@gmail.com (http://gmail.com)
(http://gmail.com)> wrote:

Hard to tell if its GC, you can monitor it using bigdesk to
see changes,
see how memory is behaving. Though you way you have a 30
minute "pause",
which is strange. Did you check the refresh stats? Also, when
this happens,
can you simply get by id the relevant new / modified document?

On Fri, Jan 20, 2012 at 5:58 PM,OrenMazor
<oren.ma...@gmail.com (http://gmail.com)> wrote:

Yup. I've done direct queries for a document that should be
there, and
even 30 minutes later, it is still not available.

based on the semi-regular pattern of these delays, I'm
wondering if
there's some kind of memory or gc issue playing up?

we have two nodes with 16gb/32 on the first, and 10/24 on
the second.

On Jan 20, 10:06 am, Shay Banon <kim...@gmail.com (http://gmail.com)
(http://gmail.com)> wrote:

It makes little sense to use query_string as a filter, I
suggest you

don't

do that. But, even when using it as a filter, you should
still see

changes.

Can you verify its not the query? i.e. just search for a
document

recently

added and see if you get it back?

On Fri, Jan 20, 2012 at 8:07 AM,OrenMazor
<oren.ma...@gmail.com (http://gmail.com)>

wrote:

also, its probably worth sharing my frontend's query:

{
"filter" : {

...

read more »

kimchy · February 15, 2012, 7:44pm

Which merge setting did you set? If you keep many segments, then searches will be a bit slower, and more resources will be used. Though, if you have a "quiet" time in your system, you can augment that by issuing explicit optimize.

On Wednesday, February 15, 2012 at 5:05 PM, Oren Mazor wrote:

I'll be re-setting up the machine in the nearby future, and bumping up
to the latest ES version (mine's 18.6).

I did some fiddling with the merge policy merge factor setting (up to
30 from the default) and it's provided some significant improvement in
my bulk calls. How high can I go on this before there are performance
issues, and what are they? specifically if I leave this setting at
30-40 permanently.

On Feb 12, 11:42 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The bulk request returns a response per item if it succeeded or not (and if failed, the failure itself), so you need to check the actual response body. Also, can you try and use a newer Java version, the one you use is pretty old.

On Saturday, February 11, 2012 at 11:47 PM, Oren Mazor wrote:

Yup. the bulk operations are all okay, at least as far as the http
response is concerned.

I'm almost certain that my problem is just that we're hitting some
resource limit for the size of our index (40gb), but I cant figure out
where to find the blockage. I'm watching the stats on the cluster and
seeing nothing other than flat/healthy usage.

I am seeing a higher than normal read/write activity over the past 24
hours (huge number of documents added)

On Feb 9, 9:15 pm, <medcl2...@gmail.com (http://gmail.com)> wrote:

hi,OrenMazor
did you checked the response of the bulk operation,are they all successful
indexed?
and also check your translog status.
also manually refresh the index and to see if can get the current version

-----Original Message-----
From:OrenMazor
Sent: Friday, February 10, 2012 2:25 AM
To: elasticsearch
Subject: Re: ES Index performance

Issuing a Get does not return the correct version.

we're using OpenJDK java version 1.6.0_18, not on EC2, with debian. we
have 1 replica and 10 shards.

I'm suspecting an IO issue, to be honest, given that if there were
indexing performance issues somebody may have seen them already, what
with "select isn't broken"

that said, we do have a pretty high rate of indexes, so maybe at scale
some issue pops up ?

On Feb 7, 2:00 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Going back to your question, do you see that issuing a Get (which is
realtime) does not return the correct version of the data? I would be
helpful to understand where the stalling is coming from. If a "get" does
not return your expect version of the data, it means that it didn't get
indexed, so you will need to look at the indexer code and see if maybe
something is stalling on the bulk API execution.

The stalling can be for many reasons, starting with slow IO, not enough
resources on the machine CPU/Mem, overloading the machines you have in the
cluster, GC .

Which JVM version are you using? Are you running on EC2? If so, which
instances / os version? How many shards do you have in the index?

On Tuesday, February 7, 2012 at 5:04 PM,OrenMazor wrote:

hi Shay,

we use server density to keep track of ES and I'm not seeing any
spikes in resource use at all. I'm suspecting we're just pushing it
more than most people do?

I do use the bulk API to send perhaps 1000 insertions every 10 seconds
on average. in some cases these are new records, and sometimes they
are new versions of existing records. I also send some amount of
deletes every minute, but these are not really using the bulk api.

could you recommend a more low level primer on ES? I'd love to have a
more low level understanding of how/why things work. it'll make it
easier for me to tune my algorithms. I can go through the source, but
if there're some papers out there I could read, that'd be better

thanks!
Oren

PS. I noticed there is now a mongodb river in development. I'm
wondering whether my efforts might be better spent helping it to
production status rather than trying to tune my own code..

On Feb 5, 4:31 pm, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

On Saturday, February 4, 2012 at 3:18 AM,OrenMazor wrote:

so, I'm starting to see these again on heavy really heavy load (lets
say around 10k insertions a minute)

Whats the behavior of elasticsearch in this case? Memory usage ok?
When you say 10k inserts per minute, is that using the bulk API? How
many clients are indexing the data?

I'm still having some difficulty wrapping my head around the
algorithm
in the bottom end. the refresh total_time is 17h and merges is
14.9h.
this seems pretty ambiguous. I'm guessing its the total time spent
executing these actions rather than the time since, right?

Yes, thats the total time that was spent doing it.

are there some hardware settings I can make to make lucene go
faster?
also, is there anything I can read to level up on understanding the
low level side of things? I'm going through the ES code to start
with
and learning more there.

On Jan 25, 9:37 am, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

Great, thanks for the update!

On Wednesday, January 25, 2012 at 8:54 AM,OrenMazor wrote:

Hi Shay,

just a follow up (because I hate it when there is no closure).

I modified my import script to use bulk imports, so instead of
10
insertions a second, I now end up doing one bulk insertion every
ten
seconds. I had it up to a minute, but I think inserting 600-800
records in one bulk request was causing some problems, so I
shortened
the frequency.

so far I'm not seeeing any serious delays in testing this week,
but
tomorrow I'll do some bigger load testing with our big index. it
seems
promising at the moment!

On Jan 20, 2:26 pm, Shay Banon <kim...@gmail.com (http://gmail.com)
(http://gmail.com)> wrote:

Hard to tell if its GC, you can monitor it using bigdesk to
see changes,
see how memory is behaving. Though you way you have a 30
minute "pause",
which is strange. Did you check the refresh stats? Also, when
this happens,
can you simply get by id the relevant new / modified document?

On Fri, Jan 20, 2012 at 5:58 PM,OrenMazor
<oren.ma...@gmail.com (http://gmail.com)> wrote:

Yup. I've done direct queries for a document that should be
there, and
even 30 minutes later, it is still not available.

based on the semi-regular pattern of these delays, I'm
wondering if
there's some kind of memory or gc issue playing up?

we have two nodes with 16gb/32 on the first, and 10/24 on
the second.

On Jan 20, 10:06 am, Shay Banon <kim...@gmail.com (http://gmail.com)
(http://gmail.com)> wrote:

It makes little sense to use query_string as a filter, I
suggest you

don't

do that. But, even when using it as a filter, you should
still see

changes.

Can you verify its not the query? i.e. just search for a
document

recently

added and see if you get it back?

On Fri, Jan 20, 2012 at 8:07 AM,OrenMazor
<oren.ma...@gmail.com (http://gmail.com)>

wrote:

also, its probably worth sharing my frontend's query:

{
"filter" : {

...

read more »

Oren_Mazor · February 15, 2012, 8:33pm

actually I left segments as is, but I updated merge_factor to 30 from
the default 10.

On Feb 15, 2:44 pm, Shay Banon kim...@gmail.com wrote:

Which merge setting did you set? If you keep many segments, then searches will be a bit slower, and more resources will be used. Though, if you have a "quiet" time in your system, you can augment that by issuing explicit optimize.

On Wednesday, February 15, 2012 at 5:05 PM, Oren Mazor wrote:

I'll be re-setting up the machine in the nearby future, and bumping up
to the latest ES version (mine's 18.6).

I did some fiddling with the merge policy merge factor setting (up to
30 from the default) and it's provided some significant improvement in
my bulk calls. How high can I go on this before there are performance
issues, and what are they? specifically if I leave this setting at
30-40 permanently.

On Feb 12, 11:42 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The bulk request returns a response per item if it succeeded or not (and if failed, the failure itself), so you need to check the actual response body. Also, can you try and use a newer Java version, the one you use is pretty old.

On Saturday, February 11, 2012 at 11:47 PM, Oren Mazor wrote:

Yup. the bulk operations are all okay, at least as far as the http
response is concerned.

I'm almost certain that my problem is just that we're hitting some
resource limit for the size of our index (40gb), but I cant figure out
where to find the blockage. I'm watching the stats on the cluster and
seeing nothing other than flat/healthy usage.

I am seeing a higher than normal read/write activity over the past 24
hours (huge number of documents added)

On Feb 9, 9:15 pm, <medcl2...@gmail.com (http://gmail.com)> wrote:

hi,OrenMazor
did you checked the response of the bulk operation,are they all successful
indexed?
and also check your translog status.
also manually refresh the index and to see if can get the current version

-----Original Message-----
From:OrenMazor
Sent: Friday, February 10, 2012 2:25 AM
To: elasticsearch
Subject: Re: ES Index performance

Issuing a Get does not return the correct version.

we're using OpenJDK java version 1.6.0_18, not on EC2, with debian. we
have 1 replica and 10 shards.

I'm suspecting an IO issue, to be honest, given that if there were
indexing performance issues somebody may have seen them already, what
with "select isn't broken"

that said, we do have a pretty high rate of indexes, so maybe at scale
some issue pops up ?

On Feb 7, 2:00 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Going back to your question, do you see that issuing a Get (which is
realtime) does not return the correct version of the data? I would be
helpful to understand where the stalling is coming from. If a "get" does
not return your expect version of the data, it means that it didn't get
indexed, so you will need to look at the indexer code and see if maybe
something is stalling on the bulk API execution.

The stalling can be for many reasons, starting with slow IO, not enough
resources on the machine CPU/Mem, overloading the machines you have in the
cluster, GC .

Which JVM version are you using? Are you running on EC2? If so, which
instances / os version? How many shards do you have in the index?

On Tuesday, February 7, 2012 at 5:04 PM,OrenMazor wrote:

hi Shay,

we use server density to keep track of ES and I'm not seeing any
spikes in resource use at all. I'm suspecting we're just pushing it
more than most people do?

I do use the bulk API to send perhaps 1000 insertions every 10 seconds
on average. in some cases these are new records, and sometimes they
are new versions of existing records. I also send some amount of
deletes every minute, but these are not really using the bulk api.

could you recommend a more low level primer on ES? I'd love to have a
more low level understanding of how/why things work. it'll make it
easier for me to tune my algorithms. I can go through the source, but
if there're some papers out there I could read, that'd be better

thanks!
Oren

PS. I noticed there is now a mongodb river in development. I'm
wondering whether my efforts might be better spent helping it to
production status rather than trying to tune my own code..

On Feb 5, 4:31 pm, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

On Saturday, February 4, 2012 at 3:18 AM,OrenMazor wrote:

so, I'm starting to see these again on heavy really heavy load (lets
say around 10k insertions a minute)

Whats the behavior of elasticsearch in this case? Memory usage ok?
When you say 10k inserts per minute, is that using the bulk API? How
many clients are indexing the data?

I'm still having some difficulty wrapping my head around the
algorithm
in the bottom end. the refresh total_time is 17h and merges is
14.9h.
this seems pretty ambiguous. I'm guessing its the total time spent
executing these actions rather than the time since, right?

Yes, thats the total time that was spent doing it.

are there some hardware settings I can make to make lucene go
faster?
also, is there anything I can read to level up on understanding the
low level side of things? I'm going through the ES code to start
with
and learning more there.

On Jan 25, 9:37 am, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

Great, thanks for the update!

On Wednesday, January 25, 2012 at 8:54 AM,OrenMazor wrote:

Hi Shay,

just a follow up (because I hate it when there is no closure).

I modified my import script to use bulk imports, so instead of
10
insertions a second, I now end up doing one bulk insertion every
ten
seconds. I had it up to a minute, but I think inserting 600-800
records in one bulk request was causing some problems, so I
shortened
the frequency.

so far I'm not seeeing any serious delays in testing this week,
but
tomorrow I'll do some bigger load testing with our big index. it
seems
promising at the moment!

On Jan 20, 2:26 pm, Shay Banon <kim...@gmail.com (http://gmail.com)
(http://gmail.com)> wrote:

Hard to tell if its GC, you can monitor it using bigdesk to
see changes,
see how memory is behaving. Though you way you have a 30
minute "pause",
which is strange. Did you check the refresh stats? Also, when
this happens,
can you simply get by id the relevant new / modified document?

On Fri, Jan 20, 2012 at 5:58 PM,OrenMazor
<oren.ma...@gmail.com (http://gmail.com)> wrote:

Yup. I've done direct queries for a document that should be
there, and
even 30 minutes later, it is still not available.

based on the semi-regular pattern of these delays, I'm

...

read more »

kimchy · February 16, 2012, 4:51pm

The tiered merge policy has no setting for merge_factor (its a relative new (lucene wise) merge policy that has been the default for a few ES versions). See more here: Elasticsearch Platform — Find real-time answers at scale | Elastic. So, its strange that you see better behavior with higher merge factor, since it does not affect anything...

On Wednesday, February 15, 2012 at 10:33 PM, Oren Mazor wrote:

actually I left segments as is, but I updated merge_factor to 30 from
the default 10.

On Feb 15, 2:44 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Which merge setting did you set? If you keep many segments, then searches will be a bit slower, and more resources will be used. Though, if you have a "quiet" time in your system, you can augment that by issuing explicit optimize.

On Wednesday, February 15, 2012 at 5:05 PM, Oren Mazor wrote:

I'll be re-setting up the machine in the nearby future, and bumping up
to the latest ES version (mine's 18.6).

I did some fiddling with the merge policy merge factor setting (up to
30 from the default) and it's provided some significant improvement in
my bulk calls. How high can I go on this before there are performance
issues, and what are they? specifically if I leave this setting at
30-40 permanently.

On Feb 12, 11:42 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The bulk request returns a response per item if it succeeded or not (and if failed, the failure itself), so you need to check the actual response body. Also, can you try and use a newer Java version, the one you use is pretty old.

On Saturday, February 11, 2012 at 11:47 PM, Oren Mazor wrote:

Yup. the bulk operations are all okay, at least as far as the http
response is concerned.

I'm almost certain that my problem is just that we're hitting some
resource limit for the size of our index (40gb), but I cant figure out
where to find the blockage. I'm watching the stats on the cluster and
seeing nothing other than flat/healthy usage.

I am seeing a higher than normal read/write activity over the past 24
hours (huge number of documents added)

On Feb 9, 9:15 pm, <medcl2...@gmail.com (http://gmail.com)> wrote:

hi,OrenMazor
did you checked the response of the bulk operation,are they all successful
indexed?
and also check your translog status.
also manually refresh the index and to see if can get the current version

-----Original Message-----
From:OrenMazor
Sent: Friday, February 10, 2012 2:25 AM
To: elasticsearch
Subject: Re: ES Index performance

Issuing a Get does not return the correct version.

we're using OpenJDK java version 1.6.0_18, not on EC2, with debian. we
have 1 replica and 10 shards.

I'm suspecting an IO issue, to be honest, given that if there were
indexing performance issues somebody may have seen them already, what
with "select isn't broken"

that said, we do have a pretty high rate of indexes, so maybe at scale
some issue pops up ?

On Feb 7, 2:00 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Going back to your question, do you see that issuing a Get (which is
realtime) does not return the correct version of the data? I would be
helpful to understand where the stalling is coming from. If a "get" does
not return your expect version of the data, it means that it didn't get
indexed, so you will need to look at the indexer code and see if maybe
something is stalling on the bulk API execution.

The stalling can be for many reasons, starting with slow IO, not enough
resources on the machine CPU/Mem, overloading the machines you have in the
cluster, GC .

Which JVM version are you using? Are you running on EC2? If so, which
instances / os version? How many shards do you have in the index?

On Tuesday, February 7, 2012 at 5:04 PM,OrenMazor wrote:

hi Shay,

we use server density to keep track of ES and I'm not seeing any
spikes in resource use at all. I'm suspecting we're just pushing it
more than most people do?

I do use the bulk API to send perhaps 1000 insertions every 10 seconds
on average. in some cases these are new records, and sometimes they
are new versions of existing records. I also send some amount of
deletes every minute, but these are not really using the bulk api.

could you recommend a more low level primer on ES? I'd love to have a
more low level understanding of how/why things work. it'll make it
easier for me to tune my algorithms. I can go through the source, but
if there're some papers out there I could read, that'd be better

thanks!
Oren

PS. I noticed there is now a mongodb river in development. I'm
wondering whether my efforts might be better spent helping it to
production status rather than trying to tune my own code..

On Feb 5, 4:31 pm, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

On Saturday, February 4, 2012 at 3:18 AM,OrenMazor wrote:

so, I'm starting to see these again on heavy really heavy load (lets
say around 10k insertions a minute)

Whats the behavior of elasticsearch in this case? Memory usage ok?
When you say 10k inserts per minute, is that using the bulk API? How
many clients are indexing the data?

I'm still having some difficulty wrapping my head around the
algorithm
in the bottom end. the refresh total_time is 17h and merges is
14.9h.
this seems pretty ambiguous. I'm guessing its the total time spent
executing these actions rather than the time since, right?

Yes, thats the total time that was spent doing it.

are there some hardware settings I can make to make lucene go
faster?
also, is there anything I can read to level up on understanding the
low level side of things? I'm going through the ES code to start
with
and learning more there.

On Jan 25, 9:37 am, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

Great, thanks for the update!

On Wednesday, January 25, 2012 at 8:54 AM,OrenMazor wrote:

Hi Shay,

just a follow up (because I hate it when there is no closure).

I modified my import script to use bulk imports, so instead of
10
insertions a second, I now end up doing one bulk insertion every
ten
seconds. I had it up to a minute, but I think inserting 600-800
records in one bulk request was causing some problems, so I
shortened
the frequency.

so far I'm not seeeing any serious delays in testing this week,
but
tomorrow I'll do some bigger load testing with our big index. it
seems
promising at the moment!

On Jan 20, 2:26 pm, Shay Banon <kim...@gmail.com (http://gmail.com)
(http://gmail.com)> wrote:

Hard to tell if its GC, you can monitor it using bigdesk to
see changes,
see how memory is behaving. Though you way you have a 30
minute "pause",
which is strange. Did you check the refresh stats? Also, when
this happens,
can you simply get by id the relevant new / modified document?

On Fri, Jan 20, 2012 at 5:58 PM,OrenMazor
<oren.ma...@gmail.com (http://gmail.com)> wrote:

Yup. I've done direct queries for a document that should be
there, and
even 30 minutes later, it is still not available.

based on the semi-regular pattern of these delays, I'm

...

read more »

Oren_Mazor · February 16, 2012, 8:29pm

maybe its a placebo

I set it at the last minute of a heavy wave of indexing, so it
could've easily been just coincidence then.

I'm guessing I have to rebuild my index if I change merge policies,
eh? because its sounding like the log_doc policy is better for me (all
of my docs are the same size, and I dont really care about deletes
that much… they dont happen often)

I definitely read that merge page with a bias, so I didn't realize the
setting didn't even fall under the tiered option.

On Feb 16, 11:51 am, Shay Banon kim...@gmail.com wrote:

The tiered merge policy has no setting for merge_factor (its a relative new (lucene wise) merge policy that has been the default for a few ES versions). See more here:Elasticsearch Platform — Find real-time answers at scale | Elastic. So, its strange that you see better behavior with higher merge factor, since it does not affect anything...

On Wednesday, February 15, 2012 at 10:33 PM, Oren Mazor wrote:

actually I left segments as is, but I updated merge_factor to 30 from
the default 10.

On Feb 15, 2:44 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Which merge setting did you set? If you keep many segments, then searches will be a bit slower, and more resources will be used. Though, if you have a "quiet" time in your system, you can augment that by issuing explicit optimize.

On Wednesday, February 15, 2012 at 5:05 PM, Oren Mazor wrote:

I'll be re-setting up the machine in the nearby future, and bumping up
to the latest ES version (mine's 18.6).

I did some fiddling with the merge policy merge factor setting (up to
30 from the default) and it's provided some significant improvement in
my bulk calls. How high can I go on this before there are performance
issues, and what are they? specifically if I leave this setting at
30-40 permanently.

On Feb 12, 11:42 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The bulk request returns a response per item if it succeeded or not (and if failed, the failure itself), so you need to check the actual response body. Also, can you try and use a newer Java version, the one you use is pretty old.

On Saturday, February 11, 2012 at 11:47 PM, Oren Mazor wrote:

Yup. the bulk operations are all okay, at least as far as the http
response is concerned.

I'm almost certain that my problem is just that we're hitting some
resource limit for the size of our index (40gb), but I cant figure out
where to find the blockage. I'm watching the stats on the cluster and
seeing nothing other than flat/healthy usage.

I am seeing a higher than normal read/write activity over the past 24
hours (huge number of documents added)

On Feb 9, 9:15 pm, <medcl2...@gmail.com (http://gmail.com)> wrote:

hi,OrenMazor
did you checked the response of the bulk operation,are they all successful
indexed?
and also check your translog status.
also manually refresh the index and to see if can get the current version

-----Original Message-----
From:OrenMazor
Sent: Friday, February 10, 2012 2:25 AM
To: elasticsearch
Subject: Re: ES Index performance

Issuing a Get does not return the correct version.

we're using OpenJDK java version 1.6.0_18, not on EC2, with debian. we
have 1 replica and 10 shards.

I'm suspecting an IO issue, to be honest, given that if there were
indexing performance issues somebody may have seen them already, what
with "select isn't broken"

that said, we do have a pretty high rate of indexes, so maybe at scale
some issue pops up ?

On Feb 7, 2:00 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Going back to your question, do you see that issuing a Get (which is
realtime) does not return the correct version of the data? I would be
helpful to understand where the stalling is coming from. If a "get" does
not return your expect version of the data, it means that it didn't get
indexed, so you will need to look at the indexer code and see if maybe
something is stalling on the bulk API execution.

The stalling can be for many reasons, starting with slow IO, not enough
resources on the machine CPU/Mem, overloading the machines you have in the
cluster, GC .

Which JVM version are you using? Are you running on EC2? If so, which
instances / os version? How many shards do you have in the index?

On Tuesday, February 7, 2012 at 5:04 PM,OrenMazor wrote:

hi Shay,

we use server density to keep track of ES and I'm not seeing any
spikes in resource use at all. I'm suspecting we're just pushing it
more than most people do?

I do use the bulk API to send perhaps 1000 insertions every 10 seconds
on average. in some cases these are new records, and sometimes they
are new versions of existing records. I also send some amount of
deletes every minute, but these are not really using the bulk api.

could you recommend a more low level primer on ES? I'd love to have a
more low level understanding of how/why things work. it'll make it
easier for me to tune my algorithms. I can go through the source, but
if there're some papers out there I could read, that'd be better

thanks!
Oren

PS. I noticed there is now a mongodb river in development. I'm
wondering whether my efforts might be better spent helping it to
production status rather than trying to tune my own code..

On Feb 5, 4:31 pm, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

On Saturday, February 4, 2012 at 3:18 AM,OrenMazor wrote:

so, I'm starting to see these again on heavy really heavy load (lets
say around 10k insertions a minute)

Whats the behavior of elasticsearch in this case? Memory usage ok?
When you say 10k inserts per minute, is that using the bulk API? How
many clients are indexing the data?

I'm still having some difficulty wrapping my head around the
algorithm
in the bottom end. the refresh total_time is 17h and merges is
14.9h.
this seems pretty ambiguous. I'm guessing its the total time spent
executing these actions rather than the time since, right?

Yes, thats the total time that was spent doing it.

are there some hardware settings I can make to make lucene go
faster?
also, is there anything I can read to level up on understanding the
low level side of things? I'm going through the ES code to start
with
and learning more there.

On Jan 25, 9:37 am, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

Great, thanks for the update!

On Wednesday, January 25, 2012 at 8:54 AM,OrenMazor wrote:

Hi Shay,

just a follow up (because I hate it when there is no closure).

I modified my import script to use bulk imports, so instead of
10
insertions a second, I now end up doing one bulk insertion every
ten
seconds. I had it up to a minute, but I think inserting 600-800
records in one bulk request was causing some problems, so I
shortened
the frequency.

...

read more »

kimchy · February 16, 2012, 8:40pm

The tiered option is a good one, even for ones without deletes. You can change the settings of a specific merge policy in realtime using the update settings API. You can change the merge policy type on an existing index, but you need to close it, and then do the update settings on it for the new merge policy type, and then open it again.

On Thursday, February 16, 2012 at 10:29 PM, Oren Mazor wrote:

maybe its a placebo

I set it at the last minute of a heavy wave of indexing, so it
could've easily been just coincidence then.

I'm guessing I have to rebuild my index if I change merge policies,
eh? because its sounding like the log_doc policy is better for me (all
of my docs are the same size, and I dont really care about deletes
that much… they dont happen often)

I definitely read that merge page with a bias, so I didn't realize the
setting didn't even fall under the tiered option.

On Feb 16, 11:51 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The tiered merge policy has no setting for merge_factor (its a relative new (lucene wise) merge policy that has been the default for a few ES versions). See more here:Elasticsearch Platform — Find real-time answers at scale | Elastic. So, its strange that you see better behavior with higher merge factor, since it does not affect anything...

On Wednesday, February 15, 2012 at 10:33 PM, Oren Mazor wrote:

actually I left segments as is, but I updated merge_factor to 30 from
the default 10.

On Feb 15, 2:44 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Which merge setting did you set? If you keep many segments, then searches will be a bit slower, and more resources will be used. Though, if you have a "quiet" time in your system, you can augment that by issuing explicit optimize.

On Wednesday, February 15, 2012 at 5:05 PM, Oren Mazor wrote:

I'll be re-setting up the machine in the nearby future, and bumping up
to the latest ES version (mine's 18.6).

I did some fiddling with the merge policy merge factor setting (up to
30 from the default) and it's provided some significant improvement in
my bulk calls. How high can I go on this before there are performance
issues, and what are they? specifically if I leave this setting at
30-40 permanently.

On Feb 12, 11:42 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

The bulk request returns a response per item if it succeeded or not (and if failed, the failure itself), so you need to check the actual response body. Also, can you try and use a newer Java version, the one you use is pretty old.

On Saturday, February 11, 2012 at 11:47 PM, Oren Mazor wrote:

Yup. the bulk operations are all okay, at least as far as the http
response is concerned.

I'm almost certain that my problem is just that we're hitting some
resource limit for the size of our index (40gb), but I cant figure out
where to find the blockage. I'm watching the stats on the cluster and
seeing nothing other than flat/healthy usage.

I am seeing a higher than normal read/write activity over the past 24
hours (huge number of documents added)

On Feb 9, 9:15 pm, <medcl2...@gmail.com (http://gmail.com)> wrote:

hi,OrenMazor
did you checked the response of the bulk operation,are they all successful
indexed?
and also check your translog status.
also manually refresh the index and to see if can get the current version

-----Original Message-----
From:OrenMazor
Sent: Friday, February 10, 2012 2:25 AM
To: elasticsearch
Subject: Re: ES Index performance

Issuing a Get does not return the correct version.

we're using OpenJDK java version 1.6.0_18, not on EC2, with debian. we
have 1 replica and 10 shards.

I'm suspecting an IO issue, to be honest, given that if there were
indexing performance issues somebody may have seen them already, what
with "select isn't broken"

that said, we do have a pretty high rate of indexes, so maybe at scale
some issue pops up ?

On Feb 7, 2:00 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Going back to your question, do you see that issuing a Get (which is
realtime) does not return the correct version of the data? I would be
helpful to understand where the stalling is coming from. If a "get" does
not return your expect version of the data, it means that it didn't get
indexed, so you will need to look at the indexer code and see if maybe
something is stalling on the bulk API execution.

The stalling can be for many reasons, starting with slow IO, not enough
resources on the machine CPU/Mem, overloading the machines you have in the
cluster, GC .

Which JVM version are you using? Are you running on EC2? If so, which
instances / os version? How many shards do you have in the index?

On Tuesday, February 7, 2012 at 5:04 PM,OrenMazor wrote:

hi Shay,

we use server density to keep track of ES and I'm not seeing any
spikes in resource use at all. I'm suspecting we're just pushing it
more than most people do?

I do use the bulk API to send perhaps 1000 insertions every 10 seconds
on average. in some cases these are new records, and sometimes they
are new versions of existing records. I also send some amount of
deletes every minute, but these are not really using the bulk api.

could you recommend a more low level primer on ES? I'd love to have a
more low level understanding of how/why things work. it'll make it
easier for me to tune my algorithms. I can go through the source, but
if there're some papers out there I could read, that'd be better

thanks!
Oren

PS. I noticed there is now a mongodb river in development. I'm
wondering whether my efforts might be better spent helping it to
production status rather than trying to tune my own code..

On Feb 5, 4:31 pm, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

On Saturday, February 4, 2012 at 3:18 AM,OrenMazor wrote:

so, I'm starting to see these again on heavy really heavy load (lets
say around 10k insertions a minute)

Whats the behavior of elasticsearch in this case? Memory usage ok?
When you say 10k inserts per minute, is that using the bulk API? How
many clients are indexing the data?

I'm still having some difficulty wrapping my head around the
algorithm
in the bottom end. the refresh total_time is 17h and merges is
14.9h.
this seems pretty ambiguous. I'm guessing its the total time spent
executing these actions rather than the time since, right?

Yes, thats the total time that was spent doing it.

are there some hardware settings I can make to make lucene go
faster?
also, is there anything I can read to level up on understanding the
low level side of things? I'm going through the ES code to start
with
and learning more there.

On Jan 25, 9:37 am, Shay Banon <kim...@gmail.com (http://gmail.com)>
wrote:

Great, thanks for the update!

On Wednesday, January 25, 2012 at 8:54 AM,OrenMazor wrote:

Hi Shay,

just a follow up (because I hate it when there is no closure).

I modified my import script to use bulk imports, so instead of
10
insertions a second, I now end up doing one bulk insertion every
ten
seconds. I had it up to a minute, but I think inserting 600-800
records in one bulk request was causing some problems, so I
shortened
the frequency.

...

read more »

Topic		Replies	Views
Issue Indexing 50mil Docs via Bulk API Elasticsearch	23	2430	July 5, 2017
Slow Bulk Insert Elasticsearch	11	2345	July 6, 2017
Question: How to gauge/improve performance Elasticsearch	17	1020	July 6, 2017
Extremly slow troughput on large index Elasticsearch	8	1000	July 6, 2017
Bulk insertion taking long and throwing lots of errors Elasticsearch	6	448	July 6, 2017

ES Index performance

Related topics