ES indexing speed decreases substantially while doing refresh

shadyabhi · March 19, 2013, 9:08am

Hi,

While indexes are being refreshed (checked it via hot_threads) ,
indexing speed decreases substantially. Is there a way I can schedule
refreshes?

Also, my ES logs are occasionally filled with this (keeps getting
logged every few seconds) and indexing speed decreases substantially.

[2013-03-19 09:03:28,186][INFO ][cluster.metadata ] [index11]
[logstash-2013.03.19] update_mapping [syslog] (dynamic)

What does the above error actually mean? Is this because of too many
fields coming to ES? I enabled a kv filter in logstash that created
too may fields, it's related I think.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 19, 2013, 12:52pm

On Tue, 2013-03-19 at 14:38 +0530, Abhijeet Rastogi wrote:

Hi,

While indexes are being refreshed (checked it via hot_threads) ,
indexing speed decreases substantially. Is there a way I can schedule
refreshes?

Also, my ES logs are occasionally filled with this (keeps getting
logged every few seconds) and indexing speed decreases substantially.

[2013-03-19 09:03:28,186][INFO ][cluster.metadata ] [index11]
[logstash-2013.03.19] update_mapping [syslog] (dynamic)

This means that you have indexed a field that hasn't been seen
previously, which updates the cluster state, and can cause a pause. If
you have a limited number of fields, then eventually you will have seen
them all and it won't do this anymore. but if you keep adding new
fields, then this will continue to happen.

I think this is more likely to be causing pauses than refreshes

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shadyabhi · March 20, 2013, 11:49am

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

On Tue, Mar 19, 2013 at 6:22 PM, Clinton Gormley clint@traveljury.com wrote:

On Tue, 2013-03-19 at 14:38 +0530, Abhijeet Rastogi wrote:

Hi,

While indexes are being refreshed (checked it via hot_threads) ,
indexing speed decreases substantially. Is there a way I can schedule
refreshes?

Also, my ES logs are occasionally filled with this (keeps getting
logged every few seconds) and indexing speed decreases substantially.

[2013-03-19 09:03:28,186][INFO ][cluster.metadata ] [index11]
[logstash-2013.03.19] update_mapping [syslog] (dynamic)

This means that you have indexed a field that hasn't been seen
previously, which updates the cluster state, and can cause a pause. If
you have a limited number of fields, then eventually you will have seen
them all and it won't do this anymore. but if you keep adding new
fields, then this will continue to happen.

I think this is more likely to be causing pauses than refreshes

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 20, 2013, 12:27pm

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see

There are just too many moving parts to give a definitive answer.

A typical test is:

create an index with a single shard
pump data into it
while querying it
figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shadyabhi · March 20, 2013, 4:12pm

Are you essentially saying that I check my limits for a single box by
creating one node with single shard. Get that magical value, divide it
by my required throughput & I'll get the number of nodes?

On Wed, Mar 20, 2013 at 5:57 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see

There are just too many moving parts to give a definitive answer.

A typical test is:

create an index with a single shard

pump data into it

while querying it

figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 21, 2013, 10:26am

On Wed, 2013-03-20 at 21:42 +0530, Abhijeet Rastogi wrote:

Are you essentially saying that I check my limits for a single box by
creating one node with single shard. Get that magical value, divide it
by my required throughput & I'll get the number of nodes?

yes. more or less. add a bit of overcapacity just in case

clint

On Wed, Mar 20, 2013 at 5:57 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see

There are just too many moving parts to give a definitive answer.

A typical test is:

create an index with a single shard

pump data into it

while querying it

figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
Google Workspace Updates: New community features for Google Chat and an update on Currents

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shadyabhi · March 21, 2013, 10:30am

Is it possible in ES to reduce the effect of increase in number of indices?

I noticed that as soon as the indices reaches the order of TBs, the vm
heap started becoming an issue. There should be a feature like, don't
keep certain indices in cache, no matter what?

Or running a cron that clears cache for certain indices is the only way?

On Thu, Mar 21, 2013 at 3:56 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 21:42 +0530, Abhijeet Rastogi wrote:

Are you essentially saying that I check my limits for a single box by
creating one node with single shard. Get that magical value, divide it
by my required throughput & I'll get the number of nodes?

yes. more or less. add a bit of overcapacity just in case

clint

On Wed, Mar 20, 2013 at 5:57 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see

There are just too many moving parts to give a definitive answer.

A typical test is:

create an index with a single shard

pump data into it

while querying it

figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
Google Workspace Updates: New community features for Google Chat and an update on Currents

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
http://blog.abhijeetr.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 21, 2013, 11:43am

On Thu, 2013-03-21 at 16:00 +0530, Abhijeet Rastogi wrote:

Is it possible in ES to reduce the effect of increase in number of indices?

I noticed that as soon as the indices reaches the order of TBs, the vm
heap started becoming an issue. There should be a feature like, don't
keep certain indices in cache, no matter what?

Close unused indices?
Add more RAM?
Add more nodes?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shadyabhi · March 21, 2013, 1:03pm

Close unused indices is a good thing I learned. Thanks. But, how about
clearning caches for old indices as a cron? I still want data to be
searchable if someone searches.

On Thu, Mar 21, 2013 at 5:13 PM, Clinton Gormley clint@traveljury.com wrote:

On Thu, 2013-03-21 at 16:00 +0530, Abhijeet Rastogi wrote:

Close unused indices?
Add more RAM?
Add more nodes?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
http://blog.abhijeetr.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Clinton_Gormley · March 21, 2013, 3:52pm

On Thu, 2013-03-21 at 18:33 +0530, Abhijeet Rastogi wrote:

Close unused indices is a good thing I learned. Thanks. But, how about
clearning caches for old indices as a cron? I still want data to be
searchable if someone searches.

You can clear caches if you want to, but are you sure you're not
optimzing prematurely? Elasticsearch should take care of this stuff
itself.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
ELK stack needs tuning Elasticsearch	6	1251	July 6, 2017
Slow Indexing Speed Elasticsearch	5	7243	July 6, 2017
Very slow ElasticSearch Index Elasticsearch	8	400	July 6, 2017
Ultra-slow indexing Elasticsearch	12	834	July 6, 2017
Improve indexing throughput Elasticsearch	15	2663	July 6, 2017

ES indexing speed decreases substantially while doing refresh

Related topics