Performance boost with multiple types

Hello together,

I'm using elasticsearch to store server statistics with million of
documents.

The statistics of all servers are stored into a single index/type:

curl -XPUT 'http://localhost:9200/statistics/stats/'...

Currently 50 million documents (within 3 month) are stored and I want to
store the data at least for 10 years. In addition the count of servers
grows as well.

Now my question:

Do you think to split the type "stats" into multiple types would boost the
search time of elasticsearch? As example if a type for each server would be
created:

curl -XPUT 'http://localhost:9200/statistics/server1/'...
curl -XPUT 'http://localhost:9200/statistics/server2/'...
curl -XPUT 'http://localhost:9200/statistics/server3/'...
curl -XPUT 'http://localhost:9200/statistics/server10001/'...
curl -XPUT 'http://localhost:9200/statistics/server10002/'...
curl -XPUT 'http://localhost:9200/statistics/server10003/'...
...

Or does it makes no difference if the data are stored to a single or
multiple types?

I would be glad to read from you :slight_smile:

Cheers,
Jonny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

J. Schulz wrote:

Do you think to split the type "stats" into multiple types would
boost the
search time of elasticsearch? As example if a type for each
server would be
created:

curl -XPUT 'http://localhost:9200/statistics/server1/'...
curl -XPUT 'http://localhost:9200/statistics/server2/'...
curl -XPUT 'http://localhost:9200/statistics/server3/'...
curl -XPUT 'http://localhost:9200/statistics/server10001/'...
curl -XPUT 'http://localhost:9200/statistics/server10002/'...
curl -XPUT 'http://localhost:9200/statistics/server10003/'...
...

Or does it makes no difference if the data are stored to a
single or
multiple types?

Types are really just for mapping separation. You won't really
see much difference in raw search perf, but if you create a lot
of types it can make the cluster state large, which amplifies the
communication in the cluster among the nodes.

In this case, if the docs will have the same general structure,
use a consistent type and include a field in each doc to store the
server name. It's likely a better design anyway.

Drew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Drew,

thank you for your response.

Another idea is to split the data into different indices by year and month:

curl -XPUT 'http://localhost:9200/statistics2013.05/stats/'...
curl -XPUT 'http://localhost:9200/statistics2013.06/stats/'...
curl -XPUT 'http://localhost:9200/statistics2013.07/stats/'...
curl -XPUT 'http://localhost:9200/statistics2013.08/stats/'...

The only disadventage is that the index buffer size is divided between the
shards. Older indices and their shards are nearly static because server
statistics are stored into the latest index.

How could I handle this? Is there a way to set the max memory size for a
special index? As example the index statistics2013.08 is the latest index.
The indices statistics2013.05 - 2013.07 are just used to search statistics.

Cheers
Jonny

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.