Optimal number of shards in elasticsearch


(krishnan) #1

Hi All,

I am new to elasticsearch and trying to create a fast search infrastructure
for our portal using elasticsearch.

We store user's financial activities in a sql table. It contains around 50
fields. Once a record is created, only one of the field get updates 10
times within 30 days of its creation.

We expect to get around 3k inserts, 1k updates and 1k search request per
second, for each of action at new search functionality.

The size of each record is 1 KB and at any time we plan to keep records of
last 1.5 years in our search servers i.e, ~30 billion records. Our search
query will always include a customerID+ a date range + some filters on
other fileds. And we expect a net increase of 15 million inserts everyday.

We plan to use ElasticeSearch for our fast search infrastructure and came
up with the following mapping / index design i.e, one global index with an
alias for each customer with a routing key and filter on the customer's ID
as following:

curl -XPUT localhost:9200/customers -d '{
"settings": {
"index": {
"number_of_shards": <NUM_OF_SHARDS>,
"number_of_replicas": 1
}
}
"mappings": {
"Transactions": {...}
}'

curl -XPOST localhost:9200/aliases -d '{
"actions": [{
"add": {
"index": "customer",
"alias": customer
,
"filter": {"term": {"customerID": 1}},
"routing": 1
}
}]
}'

I would appreciate if someone could review the above mapping and suggest
what should be the optimal number of shards we should create in our global
index.

Thank You,
Regards,
Krishnan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/751edaab-ab1d-46b0-8ee7-7be5304032eb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

There is no optimal number of shards, and if you select a number for the
shards of an index, it is orthogonal to the mapping, doc size, number of
records, number of users - there is no relationship.

You have several options:

  • if you have a fixed maximum number of nodes you will ever run in your ES
    cluster, you should consider if you can select this number also for the
    number of shards of an index, so you will never worry about scaling the
    index

  • if you want fast recovery and fast shard movement between nodes, you
    should consider to let a shard never grow larger than ~1-10GB. This gives a
    factor in relationship to your total index size, and can be used for
    evaluating the total number of shards

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEQ%3DyqKmbgS3esuYKAabuOHAFo7bYhMo1K2zJGMCrwpPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3