Elasticsearch max number of documents for one index


(Mikhail) #1

Hello
I have 2 ES nodes in cluster one master with type data and one slave with type data. 3 indexes with 5 shards 0 replicas with 300 million documents each. Nodes have 2 core CPUs and 32gb RAM with 20gb configured for elasticsearch.
There is an indexing via bulk api 3000 documents every 2 minutes with force refresh.
The question is: do I get limit of documents for one index in my case?
Aggregation took 20 seconds when number of documents was 150 million now takes 600 seconds.
Search takes long time as well.

Documents have nested fields, and some of fields use keyword_analyzer. Can it be the bottleneck?
Thanks a lot for your opinion and ideas!


Maximum number of 'type' for one index of Elasticsearch
(Mark Walkom) #2

There are limits to the number of docs per shard of 2 billion, which is a hard lucene limit.

However when doing aggs over an increasing amount of data things will slow down, how slow will depend on what your queries are doing.
Also the more you load into ES the more resources it'll use, which is when scaling horizontally is good as you have more resources to use :smile:

Why are you force refreshing?


(Mikhail) #3

Hello Mark, thank you a lot for information. At least I know the limit now :-).
I have quite heavy aggregation, there is an example below.

I have 5 shards in index with Master: 0,1,4 Slave 2,3. If I add 3 nodes and it will be 5 nodes with 1 shard each Do I get search speed increased? Query? Aggregation? If yes why? I have read that Lucene searches shards one by one linear, how will it increase search speed?

Is there any correlation between number of document in index and number of shards? How many documents should be in one shard approximately?

As for refresh operation: I have two types in index one type is a main type 300 millions of documents with this type, and UI type. I have big inflow of UI types but almost all of them are the same I need to check that there is no same document in index - index operation is done quite rear, I need to make refresh operation to be sure that last added UI type document is searchable. Does refresh operation perform to whole index, not only the type? May be it is better to move UI to another index?

Sorry for the big text.
Aggregation example:

 {
    "from": 0,
    "size": 0,
    "aggs": {
       "filtered_aggs": {
          "filter": {
             "and": {
                "filters": [
                   {
                      "or": {
                         "filters": [
                            {
                               "regexp": {
                                  "ImageName": {
                                     "value": ".*MAP_01.*"
                                  }
                               }
                            }
                         ]
                      }
                   },
                   {
                      "or": {
                         "filters": [
                            {
                               "term": {
                                  "EventCategoryName": "Movement"
                               }
                            }
                         ]
                      }
                   },
                   {
                      "or": {
                         "filters": [
                            {
                               "term": {
                                  "EventName": "Movement"
                               }
                            }
                         ]
                      }
                   },
                   {
                      "range": {
                         "EventTime": {
                            "lte": "2015-07-24"
                         }
                      }
                   },
                   {
                      "range": {
                         "EventTime": {
                            "gte": "2015-06-24"
                         }
                      }
                   }
                ]
             }
          },
          "aggs": {
             "x_agg": {
                "terms": {
                   "field": "XPoint",
                   "size": 0
                },
                "aggs": {
                   "z_agg2": {
                      "terms": {
                         "field": "ZPoint",
                         "size": 0
                      },
                      "aggs": {
                         "f_agg3": {
                            "stats": {
                               "field": "Id"
                            }
                         }
                      }
                   }
                }
             }
          }
       }
    }
 }

(Christian Dahlqvist) #4

Query and aggregation performance will vary with the size of the shards. The relationship between shard size and query performance will depend on the type of data you are indexing as well as the type and complexity of queries you are running. In order to determine the optimal shard size, it is recommended that you create an index with a single shard and no replicas and then index batches of documents into it. After each batch, run your queries and aggregations and keep track of the response times. These should increase with the size of the shard and will give you a good idea of how large shards you should have in order to meet your latency requirements. Based on this and the expected data volumes, you can determine how many shards an index should have.


(Mikhail) #5

Thank you a lot Christian! I will try to calculate with way you mentioned.

There are 5 shards for 300 million documents 60 million each shard approximately.
I make index with 10 shards and each shard will handle 30 million documents.

Now 2 nodes have Master 3 shards, slave 2 shards - search is slow
If I have 2 nodes Master 5 shards, slave 5 shards, do I get search faster since shards are smaller? Or I need to add nodes?
I am trying to understand the way ES perform search. Sorry if my questions seems stupid.


(Jason Zheng) #6

Hi Mark,

Similar question, max number of 'type' for one index of ES

Jason


(Mark Walkom) #7

Please start your own thread :slight_smile:


(Jason Zheng) #8

Hi Mark,

Sorry that I am not really understand, can you explain more? Thanks

Jason


(Mark Walkom) #9

This thread is 3 months old and your question doesn't relate to the original question.
Please start your own thread instead.


(Thomas Sch) #10

2 billion (US) or 2 billion (EU)?

How many zeros?


(Mark Walkom) #11

What's the difference?


#12

@warkolm I think @FetteRobbe is referring
2 Billion (EU) = 1 Million x 1 Million (12 zeros)
2 Billion (US) = 1000 x 1 Million (9 zeros)


(Mark Walkom) #13

Oh, there you go!

It's a 2^32 issue, so whatever one fits that :wink:


(system) #14