Records per shard

Hi

What's the best practice/main thing to consider when deciding upon the number of shards in your index vs the number of documents you will have?

For example in our dev system we've had a problem highlighted here:

https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

Is there a formula or "best practice guide" to consider when deciding this?

One shard per node is nice as then data is spread across them. However when taking things like relevance into account this obviously changes.

How big is your dataset?

At the moment we were developing only on 6-8 records, which I accept is hardly anything but we needed to incrementally build and understand the scoring system when we add/remove/edit records.

Going forward we expect it to be approx 147,000 documents.

And how big are they?

Sorry for the slow reply and excuse my ignorance but how could I find that?

EDIT: In sense I ran:

GET /IndexName/_stats

And got:

"docs": {
"count": 145261,
"deleted": 0
},
"store": {
"size_in_bytes": 23120808,
"throttle_time_in_millis": 95
}

You can also use the _cat API.

Given that size you should aim for a single shard. Otherwise check out https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

Thanks for your reply Mark, much appreciated. Would I be on the right track thinking there might be a performance impact with dfs-query-then-fetch? I guess it's a trade-off we would have to consider but for now I think it's easier for us to have a single shard.