Elasticsearch- Single Index vs Multiple Indexes

Try dropping clauses to find the costs.

Can you really consider something a phrase if the words are ten thousand words apart?

Thanks Mark for your response.
The slop of 10000 was for a different purpose. However, I tested and found that it is not the issue.

As per your suggestion, I delved deeper and found that search on large number of fields was leading to too many disk seeks which is actually the real issue.

I am planning to go ahead with a single copy_to field to reduce the number of fields being searched upon. However, I have 2 further questions:

  1. Using copy_to how can we be able to highlight the matched text for search?

  2. Using copy_to how can we be able to identify the actual field for which the search text matched.

For example: I have two fields "First_name" and "Last_name" both of which are copied to "Full_name". I search a string present in "First_name" using "Full_name". What is the way to find that search string match was part of "First_name" field?

I'm not sure how relevant would be this suggestion, but I'm going to put it here :slight_smile:

About the mapping explosion, I had similiar issue, after trying to store the names of different Cookie fields in a separate index field, and their values as fied values. I had no idea how much 'random' cookie names would come to my index, which would definitely lead to mapping explosion.

I ended up with the following solution:

My wrong index mapping, was looking like this :
index : {
"$cookie_name" : "$cookie_value"
"$cookie_name2" : "$cookie_value2"
}

The right solution was to use nested data type and the following structure :
Index: {
Cookie:
[
{
cookie_name : $cookie_name1,
cookie_value : $cookie_value1
},
{
cookie_name : $cookie_name2,
cookie_value : $cookie_value2
},
]
}

With this structure, I'm now able to have infinite number of different cookie names , without risking to hit mapping explosion.

Here is some info on the nested datatype:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

Thank you @bugggbear for the response.

Although this seems to reduce mapping explosion issue but have the following questions:

  1. This approach would fail to retrieve field information at the time of highlighting i.e. when I search on cookie_value1, how do I ensure it belongs to cookie_name1. I need this information as cookie_value1 is actually a value of the key cookie_name1.

  2. How do I achieve an update of a particular part of this Cookie array? Lets say, I will have to update cookie_value from cookie_value2 to cookie_value3.

  3. Sometimes cookie_value might contain an array of strings. For example, In my use case cookie_value may be of different types. It can be an array of strings too. Lets say it contains ["cookie_value4", "cookie_value5"] . How will this scenario be taken care internally by Elasicsearch?

  4. How can we be able to achieve features like aggregation, sorting and filtering?

I'm not quite sure about 1)

For the rest of your answers, I think the following links have most of the answers:
https://qbox.io/blog/sorting-nested-fields-in-elasticsearch
https://qbox.io/blog/elasticsearch-aggregations-nested-documents-tutorial

I have only used basic filtering by cookie name till now. I suppose someone more experienced with nested documents could give you more deep answers regarding your questions.

Cheers.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.