ES indexing speed decreases substantially while doing refresh

Hi,

While indexes are being refreshed (checked it via hot_threads) ,
indexing speed decreases substantially. Is there a way I can schedule
refreshes?

Also, my ES logs are occasionally filled with this (keeps getting
logged every few seconds) and indexing speed decreases substantially.

[2013-03-19 09:03:28,186][INFO ][cluster.metadata ] [index11]
[logstash-2013.03.19] update_mapping [syslog] (dynamic)

What does the above error actually mean? Is this because of too many
fields coming to ES? I enabled a kv filter in logstash that created
too may fields, it's related I think.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tue, 2013-03-19 at 14:38 +0530, Abhijeet Rastogi wrote:

Hi,

While indexes are being refreshed (checked it via hot_threads) ,
indexing speed decreases substantially. Is there a way I can schedule
refreshes?

Also, my ES logs are occasionally filled with this (keeps getting
logged every few seconds) and indexing speed decreases substantially.

[2013-03-19 09:03:28,186][INFO ][cluster.metadata ] [index11]
[logstash-2013.03.19] update_mapping [syslog] (dynamic)

This means that you have indexed a field that hasn't been seen
previously, which updates the cluster state, and can cause a pause. If
you have a limited number of fields, then eventually you will have seen
them all and it won't do this anymore. but if you keep adding new
fields, then this will continue to happen.

I think this is more likely to be causing pauses than refreshes

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

On Tue, Mar 19, 2013 at 6:22 PM, Clinton Gormley clint@traveljury.com wrote:

On Tue, 2013-03-19 at 14:38 +0530, Abhijeet Rastogi wrote:

Hi,

While indexes are being refreshed (checked it via hot_threads) ,
indexing speed decreases substantially. Is there a way I can schedule
refreshes?

Also, my ES logs are occasionally filled with this (keeps getting
logged every few seconds) and indexing speed decreases substantially.

[2013-03-19 09:03:28,186][INFO ][cluster.metadata ] [index11]
[logstash-2013.03.19] update_mapping [syslog] (dynamic)

This means that you have indexed a field that hasn't been seen
previously, which updates the cluster state, and can cause a pause. If
you have a limited number of fields, then eventually you will have seen
them all and it won't do this anymore. but if you keep adding new
fields, then this will continue to happen.

I think this is more likely to be causing pauses than refreshes

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see :slight_smile:

There are just too many moving parts to give a definitive answer.

A typical test is:

  1. create an index with a single shard
  2. pump data into it
  3. while querying it
  4. figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Are you essentially saying that I check my limits for a single box by
creating one node with single shard. Get that magical value, divide it
by my required throughput & I'll get the number of nodes?

On Wed, Mar 20, 2013 at 5:57 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see :slight_smile:

There are just too many moving parts to give a definitive answer.

A typical test is:

  1. create an index with a single shard
  2. pump data into it
  3. while querying it
  4. figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Wed, 2013-03-20 at 21:42 +0530, Abhijeet Rastogi wrote:

Are you essentially saying that I check my limits for a single box by
creating one node with single shard. Get that magical value, divide it
by my required throughput & I'll get the number of nodes?

yes. more or less. add a bit of overcapacity just in case

clint

On Wed, Mar 20, 2013 at 5:57 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see :slight_smile:

There are just too many moving parts to give a definitive answer.

A typical test is:

  1. create an index with a single shard
  2. pump data into it
  3. while querying it
  4. figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
Google Workspace Updates: New community features for Google Chat and an update on Currents

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Is it possible in ES to reduce the effect of increase in number of indices?

I noticed that as soon as the indices reaches the order of TBs, the vm
heap started becoming an issue. There should be a feature like, don't
keep certain indices in cache, no matter what?

Or running a cron that clears cache for certain indices is the only way?

On Thu, Mar 21, 2013 at 3:56 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 21:42 +0530, Abhijeet Rastogi wrote:

Are you essentially saying that I check my limits for a single box by
creating one node with single shard. Get that magical value, divide it
by my required throughput & I'll get the number of nodes?

yes. more or less. add a bit of overcapacity just in case

clint

On Wed, Mar 20, 2013 at 5:57 PM, Clinton Gormley clint@traveljury.com wrote:

On Wed, 2013-03-20 at 17:19 +0530, Abhijeet Rastogi wrote:

What I'm essentially doing is splitting all texts that are like
key=value and creating fields key with value as value.

Documents are coming in at 8000 per sec [ new index is created
everyday & has size of around 500GB with standard analyzer on all
fields] and so, I might be getting many many new fields. So, I guess,
I should not do this until I've enough resources (Right now, only 2
nodes, 48GB RAM, 8 core 2.0GHz box).

Can you give any idea about the number of instances and configuration
I would need to add around 20k docs per sec with this kind of setup.
I'm essentially logging logs from various machines in elasticsearch.

The answer to these questions is always the same, I'm afraid: try it and
see :slight_smile:

There are just too many moving parts to give a definitive answer.

A typical test is:

  1. create an index with a single shard
  2. pump data into it
  3. while querying it
  4. figure out where index/query performance drops off

that pretty much gives you the limit for a single shard, which you can
then extrapolate to match your future requirements

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
Google Workspace Updates: New community features for Google Chat and an update on Currents

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
http://blog.abhijeetr.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-03-21 at 16:00 +0530, Abhijeet Rastogi wrote:

Is it possible in ES to reduce the effect of increase in number of indices?

I noticed that as soon as the indices reaches the order of TBs, the vm
heap started becoming an issue. There should be a feature like, don't
keep certain indices in cache, no matter what?

Close unused indices?
Add more RAM?
Add more nodes?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Close unused indices is a good thing I learned. Thanks. But, how about
clearning caches for old indices as a cron? I still want data to be
searchable if someone searches.

On Thu, Mar 21, 2013 at 5:13 PM, Clinton Gormley clint@traveljury.com wrote:

On Thu, 2013-03-21 at 16:00 +0530, Abhijeet Rastogi wrote:

Close unused indices?
Add more RAM?
Add more nodes?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
http://blog.abhijeetr.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Thu, 2013-03-21 at 18:33 +0530, Abhijeet Rastogi wrote:

Close unused indices is a good thing I learned. Thanks. But, how about
clearning caches for old indices as a cron? I still want data to be
searchable if someone searches.

You can clear caches if you want to, but are you sure you're not
optimzing prematurely? Elasticsearch should take care of this stuff
itself.

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.