In my project we are using ES 1.4 and one of our index has around 200 document types. We are now upgrading ES to 7.4 and one of the major problem is removal of document types. We are on single node, so creating 200 indices will be too much to handle in 14GB heap. (We have other indices too apart from this index).
I am thinking of below approach and request folks to provide their suggestions:-
Create one index and store each document type as a nested field. Say, my types name are type1, type2... type200. Then my mapping of new index will look like below:-
While fetching I will be filtering on docType and using source fields to fetch only specific type field.
I need suggestion if this approach will work, if I will store some 100K documents. The size of each document will not be huge, it will be say some 10 fields.
Every type will have a different structure, so one field cannot be used for all types. As we will need filtering amd sorting on type fields, we need to store all different types as separate field.
My question is - is this approach fine? Or will this approach has serious drawbacks?
Currently we are using ES 1.4 and one of our index has 200 document types. We don't want to create 200 indices as we are upgrading to ES 7.4. So the option we have is to store the documents of different type in single index using approach I explained in the question. Idea is to have one field for each document type. Document types doesn't have a common structure.
You have the docType field which you can filter on so there is no need for the nesting. Just put those fields at the root level and you should be fine as long as you do not have fields with conflicting mappings. If you do you will need to split into a number of different indices anyway as nesting will not help with this.
The reason we don't want to flatten the structure is - we already have a huge java code base that has an object structure same as document type. So, to save effort in changing the code base, we want store the document object as a nested field.
I appreciate your time and effort in answering my question. Do you see any bottleneck with this approach?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.