Aliases with filters and fielddata

(Vad) #1

I would like someone to confirm an issue about how our indexes are structured.

We have and Index with all the documents of a type that counts 20M docs. Of these, only 6M have an isActive=true field and are used in aggregations/sorting/. 14M are mainly accessed by id, sometimes using term/match searches.

We have an alias with a filter to work only on these 6M companies.

The question is: is ElasticSearch loading fielddata for all 20M docs? Would it be better to have 2 different indexes (active/inactive) and an alias "full"?


(Colin Goodheart-Smithe) #2

Yes, fielddata needs to be loaded for all 20M docs in the index.

I may well be a good idea to split the indexes into two separate indexes as you suggest so that for the aggregation field data is only loaded for the documents you are interested in (as you always filter your aggregation to work with the active documents). This may affect the relevancy of your queries though (when you use match) as the term statistics used in the query so it would be worth testing this for your use case before you commit to splitting the indices.

You might also want to look at doc values. This is an on-disk alternative to field data that performs very well. You may be able to keep your documents in the same index (doc values would be computed for all documents at index time so you may still want to split the index into two). You will need to re-index your data to enable doc values and they are not available for analyzed string fields.

Hope that helps.

(system) #3