Upgrading 200+ document types in single index from ES 1.4 to ES 7.4

In my project we are using ES 1.4 and one of our index has around 200 document types. We are now upgrading ES to 7.4 and one of the major problem is removal of document types. We are on single node, so creating 200 indices will be too much to handle in 14GB heap. (We have other indices too apart from this index).

I am thinking of below approach and request folks to provide their suggestions:-

Create one index and store each document type as a nested field. Say, my types name are type1, type2... type200. Then my mapping of new index will look like below:-

curl -XPUT "http://localhost:9200/new_index"

Mapping will look like below:-

{
"properties": {
"docType" : {
"type" : "keyword"
},
"type1": {
"type": "nested",
"properties" : {
"field1" : {
"type" : "text"
}
}
},
"type2": {
"type": "nested",
"properties" : {
"field2" : {
"type" : "text"
}
}
}
.
.
.
"type200": {
"type": "nested",
"properties" : {
"field2" : {
"type" : "text"
}
}
}
}
}

So, if i am indexing type1 as:-

curl -XPOST localhost:9200/new_index/_doc -d '{"docType":"type1","type1":{"field1":"xxxxx"}}'

While fetching I will be filtering on docType and using source fields to fetch only specific type field.

I need suggestion if this approach will work, if I will store some 100K documents. The size of each document will not be huge, it will be say some 10 fields.

Do any of the types have conflicting mappings? If not, just store the type as a single top level field.

Every type will have a different structure, so one field cannot be used for all types. As we will need filtering amd sorting on type fields, we need to store all different types as separate field.

My question is - is this approach fine? Or will this approach has serious drawbacks?

Can you show a few sample events? I do not understand your approach and why to do it that way.

Currently we are using ES 1.4 and one of our index has 200 document types. We don't want to create 200 indices as we are upgrading to ES 7.4. So the option we have is to store the documents of different type in single index using approach I explained in the question. Idea is to have one field for each document type. Document types doesn't have a common structure.

Sample json would be :

{"docType":"certs","certs":{"name":"test", "certificatePath" : "/tmp/abc.pem"}}

{"docType":"mgmt_details","mgmt_details":{"ipAddress":"10.16.1.11", "timestamp":3535351777} }

{"docType":"target_info","target_info":{"host":"a.com", "target":"10.1.1.1"} }

You have the docType field which you can filter on so there is no need for the nesting. Just put those fields at the root level and you should be fine as long as you do not have fields with conflicting mappings. If you do you will need to split into a number of different indices anyway as nesting will not help with this.

The reason we don't want to flatten the structure is - we already have a huge java code base that has an object structure same as document type. So, to save effort in changing the code base, we want store the document object as a nested field.

I appreciate your time and effort in answering my question. Do you see any bottleneck with this approach?

Then use the structure you have but do not map them as nested.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.