Elasticsearch- Single Index vs Multiple Indexes

(Pranav) #13

Thank you @Mark_Harwood

The hypothesis I can build out of your response is :
When 1000 fields are indexed, with the help of copy_to, these fields' values are indexed into one single field in Lucene.
Lucene would now contain only two fields i.e., _source which contains my JSON in its original format and all_my_string_fields_indexed_as_one which is searchable.
Now on a Free text search, only all_my_string_fields_indexed_as_one would be referred to. Hence, my free text would practically be on a single field all_my_string_fields_indexed_as_one

In case copy_to is not used, I would have 1001 fields in Lucene, _source which contains my original JSON and 1000 fields which I have indexed that are searchable. So now on Free text search, all these 1000 fields are searched upon.

This is what I can apprehend. Am I missing something?

(Mark Harwood) #14

Your understanding is correct. The important thing to add is that copy_to will copy a JSON field's value to a choice of indexed field but by default you'll still index the original JSON field with it's JSON name. You need to set the index property to false to prevent this default behaviour. So each JSON field needs:

  1. copy_to set to target something like"all_my_string_fields_indexed_as_one" field and
  2. index property set to false to prevent indexing of the JSON field with an indexed field of the same name

(Pranav) #15

Thank you @Mark_Harwood

By using copy_to, Can I search and aggregate on individual fields only? If this is not the case, Will I have to set index property to true to search on them individually? This implies, number of fields stored in Lucene would again be 1000 fields if i need to provide search and aggregation on individual fields

Please correct me if I am wrong

(Mark Harwood) #16

If you want to search on a field individually, yes, you need an individually indexed field.
If you want to aggregate on a field individually you need doc_values enabled for that individual field.

These data structures do not come for free so it shouldn't come as a surprise that the costs involved can be a multiple of how many fields for which you chose to enable these data structures.

(Pranav) #17

Thank you @Mark_Harwood

I have one more doubt. If I have multiple indexes in which distinct fields for all indexes is 3000-4000 fields.
Now, if one single query searches on all index at once, can there be chances of mapping explosion?

(Mark Harwood) #18

As I defined to your colleague Nikesh here, I consider a "mapping explosion" to not be a specific event but a general condition of having a lot of fields.

By that definition, yes, you will have a lot of fields.

If you want to consider the impact of field numbers on something more specific (query response times, disk space, RAM utilisation, indexing speeds...) I suggest you perform some benchmarking.

(Pranav) #19

Thanks @Mark_Harwood for your response. Yes, even Nikesh is confused about the situation.

I would like to rephrase this question, When I divide a single index (which contains about 3000-4000 fields) into multiple indexes(which contains around 100 fields each), and search over all these indexes at once, the search would still be on about 3000-4000 fields is what I assume. Would there be any change in search performance between single large index and multiple smaller indexes?

(Mark Harwood) #20

Adding multiple indices is not a trick to improving search performance unless you add machines to host these on.

Benchmarking is your way forward here as there are so many variables to consider.

(Pranav) #21

Thanks @Mark_Harwood

We will surely work on Benchmarking and update the results on the same thread.

Thank you again for your constant guidance.

(Pranav) #22


Considering the same quoted scenarios, I am not worried much about performance.
I understand a mapping explosion is not a specific event but a general condition of having a lot of fields.

My main concern is if I have 30 indexes each with 100 fields, and then I search on all those 30 indexes using a single query. This will lead to searching on all 3000 fields (30 (Index) * 100 (fields in each index)). Can there be a mapping explosion in this case?

The reason I am asking this question is each index in itself contains a low number of fields but the search query is spread across multiple indexes, which cumulates to a large number of fields being searched

(Mark Harwood) #23

So you're asking "is 30 x 100 a lot"?

(Pranav) #24

May be I did not put my question correctly at first time.
The only confusion which I have is whether mapping explosion is restricted to search on a single index or is it true for search across cumulative fields of multiple indexes as well ?

(Mark Harwood) #25

They all add up

(Pranav) #26

Thanks @Mark_Harwood for your continuous help
I have read through this link https://issues.apache.org/jira/browse/LUCENE-6842 which talks about similar situation in Lucene.

(Pranav) #27

Thanks @Mark_Harwood for your continuous help. We are working on Benchmarking for our use cases.

I still have one question,
If I am searching only for a single specified field on an index having 1000 fields, can mapping explosion occur ?

This situation is different from my previous use cases where I was searching on all fields. Now the search will only be on a specified field on index containing 1000 fields.

(Christian Dahlqvist) #28

Having a very large number of fields can as Mark points out lead to a lot of performance problems. If the number of fields however is static, it is in my opinion wrong to use the term mapping explosion. As outlined in this rather old but still useful blog post, mapping explosion is when the number of mapped fields continuously increases due to how the data model is structured. Each change requires the cluster state to be updated and propagated, and this typically gets slower the larger it gets, at some point causing severe performance and stability problems.

(Pranav) #29

Thanks for your response

(Pranav) #30

@Mark_Harwood @Christian_Dahlqvist
As per your suggestion, we have proceeded with benchmarking.
Although we haven't faced any mapping explosion, our search has become drastically slow on a Free text search. Here is a summary.

Total Field Number               : Time        : Calls 
  2000                           : 2-5s        : Single call
  2000                           : 2-15s       : Multiple calls
  5000                           : 15-40s      : Single call
  5000                           : 70-90s      : Multiple calls  
  10000                          : 25-35s      : Single call
  10000                          : 2-4 mins    : Multiple calls

By Multiple calls, I mean 5 user searches simultaneously.

(Mark Harwood) #31

More indexed fields = more data structures.
More data structures = more random disk seeks.
More random disk seeks = more time.

(Pranav) #32

As per our benchmarking results, we can conclude by saying, extensive number of fields will slow down my free text search speed.

This leads to few questions:

  1. Suppose we have two different indexes but they contain same set and number(let's say 20) of fields( similar field names and data types). When two users searches simultaneously on these two indexes, will 20 fields be loaded on Cluster State as it has common fields or 40 fields will be loaded on Cluster State. If 40 fields are loaded on Cluster State, Is there a provision to make these fields common among indexes as they have similar properties(name and data type)

  2. When 100,000 users simultaneously search on single index which contains 20 fields. Is it safe to assume, 100,000*20 = 2,000,000 fields will be loaded in Cluster State?