Elasticsearch max number of documents for one index

zabykrinich · July 30, 2015, 3:09pm

Hello
I have 2 ES nodes in cluster one master with type data and one slave with type data. 3 indexes with 5 shards 0 replicas with 300 million documents each. Nodes have 2 core CPUs and 32gb RAM with 20gb configured for elasticsearch.
There is an indexing via bulk api 3000 documents every 2 minutes with force refresh.
The question is: do I get limit of documents for one index in my case?
Aggregation took 20 seconds when number of documents was 150 million now takes 600 seconds.
Search takes long time as well.

Documents have nested fields, and some of fields use keyword_analyzer. Can it be the bottleneck?
Thanks a lot for your opinion and ideas!

warkolm · July 31, 2015, 1:42am

There are limits to the number of docs per shard of 2 billion, which is a hard lucene limit.

However when doing aggs over an increasing amount of data things will slow down, how slow will depend on what your queries are doing.
Also the more you load into ES the more resources it'll use, which is when scaling horizontally is good as you have more resources to use

Why are you force refreshing?

zabykrinich · July 31, 2015, 7:48am

Hello Mark, thank you a lot for information. At least I know the limit now :-).
I have quite heavy aggregation, there is an example below.

I have 5 shards in index with Master: 0,1,4 Slave 2,3. If I add 3 nodes and it will be 5 nodes with 1 shard each Do I get search speed increased? Query? Aggregation? If yes why? I have read that Lucene searches shards one by one linear, how will it increase search speed?

Is there any correlation between number of document in index and number of shards? How many documents should be in one shard approximately?

As for refresh operation: I have two types in index one type is a main type 300 millions of documents with this type, and UI type. I have big inflow of UI types but almost all of them are the same I need to check that there is no same document in index - index operation is done quite rear, I need to make refresh operation to be sure that last added UI type document is searchable. Does refresh operation perform to whole index, not only the type? May be it is better to move UI to another index?

Sorry for the big text.
Aggregation example:

 {
    "from": 0,
    "size": 0,
    "aggs": {
       "filtered_aggs": {
          "filter": {
             "and": {
                "filters": [
                   {
                      "or": {
                         "filters": [
                            {
                               "regexp": {
                                  "ImageName": {
                                     "value": ".*MAP_01.*"
                                  }
                               }
                            }
                         ]
                      }
                   },
                   {
                      "or": {
                         "filters": [
                            {
                               "term": {
                                  "EventCategoryName": "Movement"
                               }
                            }
                         ]
                      }
                   },
                   {
                      "or": {
                         "filters": [
                            {
                               "term": {
                                  "EventName": "Movement"
                               }
                            }
                         ]
                      }
                   },
                   {
                      "range": {
                         "EventTime": {
                            "lte": "2015-07-24"
                         }
                      }
                   },
                   {
                      "range": {
                         "EventTime": {
                            "gte": "2015-06-24"
                         }
                      }
                   }
                ]
             }
          },
          "aggs": {
             "x_agg": {
                "terms": {
                   "field": "XPoint",
                   "size": 0
                },
                "aggs": {
                   "z_agg2": {
                      "terms": {
                         "field": "ZPoint",
                         "size": 0
                      },
                      "aggs": {
                         "f_agg3": {
                            "stats": {
                               "field": "Id"
                            }
                         }
                      }
                   }
                }
             }
          }
       }
    }
 }

Christian_Dahlqvist · July 31, 2015, 7:59am

Query and aggregation performance will vary with the size of the shards. The relationship between shard size and query performance will depend on the type of data you are indexing as well as the type and complexity of queries you are running. In order to determine the optimal shard size, it is recommended that you create an index with a single shard and no replicas and then index batches of documents into it. After each batch, run your queries and aggregations and keep track of the response times. These should increase with the size of the shard and will give you a good idea of how large shards you should have in order to meet your latency requirements. Based on this and the expected data volumes, you can determine how many shards an index should have.

zabykrinich · July 31, 2015, 8:21am

Thank you a lot Christian! I will try to calculate with way you mentioned.

There are 5 shards for 300 million documents 60 million each shard approximately.
I make index with 10 shards and each shard will handle 30 million documents.

Now 2 nodes have Master 3 shards, slave 2 shards - search is slow
If I have 2 nodes Master 5 shards, slave 5 shards, do I get search faster since shards are smaller? Or I need to add nodes?
I am trying to understand the way ES perform search. Sorry if my questions seems stupid.

Jason_Zheng · November 25, 2015, 6:38am

Hi Mark,

Similar question, max number of 'type' for one index of ES

Jason

warkolm · November 25, 2015, 6:50am

Please start your own thread

Jason_Zheng · November 25, 2015, 6:57am

Hi Mark,

Sorry that I am not really understand, can you explain more? Thanks

Jason

warkolm · November 25, 2015, 6:58am

This thread is 3 months old and your question doesn't relate to the original question.
Please start your own thread instead.

FetteRobbe · March 6, 2017, 11:12am

2 billion (US) or 2 billion (EU)?

How many zeros?

warkolm · March 8, 2017, 7:40pm

What's the difference?

ESCoder · March 8, 2017, 8:32pm

@warkolm I think @FetteRobbe is referring
2 Billion (EU) = 1 Million x 1 Million (12 zeros)
2 Billion (US) = 1000 x 1 Million (9 zeros)

warkolm · March 8, 2017, 8:57pm

Oh, there you go!

It's a 2^32 issue, so whatever one fits that

Topic		Replies	Views
ElasticSearch at scale Elasticsearch	4	1635	July 6, 2017
Just how big should an index be allowed to be? Elasticsearch	2	1679	July 6, 2017
Advice on number of shards Elasticsearch	3	313	July 6, 2017
Shards allocation and limitations Elasticsearch	3	308	July 6, 2017
Replication basics Elasticsearch	14	505	July 6, 2017

Elasticsearch max number of documents for one index

Related topics