Records per shard


#1

Hi

What's the best practice/main thing to consider when deciding upon the number of shards in your index vs the number of documents you will have?

For example in our dev system we've had a problem highlighted here:

https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

Is there a formula or "best practice guide" to consider when deciding this?


(Mark Walkom) #2

One shard per node is nice as then data is spread across them. However when taking things like relevance into account this obviously changes.

How big is your dataset?


#3

At the moment we were developing only on 6-8 records, which I accept is hardly anything but we needed to incrementally build and understand the scoring system when we add/remove/edit records.

Going forward we expect it to be approx 147,000 documents.


(Mark Walkom) #4

And how big are they?


#5

Sorry for the slow reply and excuse my ignorance but how could I find that?

EDIT: In sense I ran:

GET /IndexName/_stats

And got:

"docs": {
"count": 145261,
"deleted": 0
},
"store": {
"size_in_bytes": 23120808,
"throttle_time_in_millis": 95
}


(Mark Walkom) #6

You can also use the _cat API.

Given that size you should aim for a single shard. Otherwise check out https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch


#7

Thanks for your reply Mark, much appreciated. Would I be on the right track thinking there might be a performance impact with dfs-query-then-fetch? I guess it's a trade-off we would have to consider but for now I think it's easier for us to have a single shard.


(system) #8