Conflicting field index_analyzers

In the newer versions of ES (2 x), the support for conflicting field types and analysers across index types with same field names was removed. I understand the reasons but what is the reason for this to be strict - especially not having different analysers on string types?

We have an index with hundreds of types and with same field names. Some of the fields are analyzed differently at index time but searched with standard analyser. I'm trying to upgrade this index in 1.7 to 5.x. with any option that minimises work on query.

I have come up with the following options with some unknowns and need your advise :
Option#1: Create one index per type so as to use different analysers on fields with same name. Then create an alias on all indexes. All old and queries will remain the same. What is the impact of having hundreds of seperate indexes on search query performance?

Option#2: Create unique field per type with it's own analyser. Less mess of indexes but all queries needs to be changed to multi_match with regex on field names. Is it possible to simplify this?

Option#3: Nested field or multi-field seems to be out of question as they are more of analyzing same field differently in every type and my understanding is that this will eat up lot of space. Query across types becomes even more complicated.

Note that we keep adding new documents and queries in real time. Please help me with your valuable suggestions.

Thanks in advance,
Gopala

Option#1: as of my experience, the default logstash style "one-index-per-day" works good even on periods of some years, i.e hundreds of indices. So should it have some impact on search perfomance, it still should be ok to have 100 indices.

1 Like

But doesn't ES require the field type consistency for multi-index queries? Would such queries over such indices work at all? I know it is a problem at least in kibana. But it is easy to check. And probably I'm wrong :-).

Thanks for the response. I will give it a try to check the performance.

The conflict is not from the field type (all strings) but from analysers. As I mentioned above, the data only needs to be analysed differently at index time and queried using a standard analyser.

Change log says they stopped conficting analysers as it leads to bad relevance scoring. But we don't use it. Wonder why it has to be a strict conditon.

Is it possible to create a custom analyzer that changes its properties/filters (char_filter, filter) dynamically depending on the type information with the help of embdded script?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.