Sharding a big index by name

David_Stendardi · October 26, 2011, 12:53pm

Hello !

I'am using elasticsearch intensively to index statistics. Since most of
the requests will concern the last 2 months records and the volumetry will
be quite important (400 000 records/day), i'am considering to add a year
suffix to the index name, and use the template api.

example : curl -XPOST 'localhost:9200/statistic-2011/foo/' -d ...

Does it makes sense ? I'am wondering if this will optimize something or if
elasticsearch already doing these kinds of optimizations internally ?

Cheers,

David Stendardi

David_Stendardi · October 26, 2011, 12:55pm

Addendum :
It will probably be a Year - Month suffix rather than only Year. (ex :
statistic-2011-10)

phobos182 · October 26, 2011, 1:10pm

That's what we do at my company. We choose a index creation strategy to partition the data by time series. Then when we query it, it will look at a much smaller set of data rather than having it in one large index.

We choose a week index strategy (2011-42, 2011-43, 2011-44, ...) where the shard name is the year + week number.

Each index has 8 shards based on the size of the cluster. So when querying two weeks of information is hits 16 shards. When querying 3 weeks it hits 24 shards. Etc... Since ElasticSearch handles the parallel dispatch of requests, it's really not an issue to have high shard count if you have the machines to handle it.

When querying the data in ElasticSearch, you can choose what shards to execute the query on with a comma separated list.

Ex:

curl -XGET 'http://localhost:9200/2011-42,2011-43/_search?q=user:kimchy'

Topic		Replies	Views
Sharding by time Elasticsearch	16	1520	July 6, 2017
Need advice on shards for my index Elasticsearch	15	1025	September 30, 2020
Tradeoffs for using week/month (time) based indices Elasticsearch	3	410	July 6, 2017
Tips on Optimization Elasticsearch	10	1411	November 6, 2017
Time Date: Giant Index w/Shard Routing VS Small Indices w/Little Shards and Aliasing Elasticsearch	3	472	July 6, 2017

Sharding a big index by name

Related topics