Dealing with large documents (architecture question)


(Lm) #1

Hi,
ElasticSearch's documentation makes it very clear that it's meant to index large documents. I have a document that when in the moderate case is about 15MB and millions of objects. The mapping looks like this:
{
"commit": {
"properties": {
"added": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
}
}
},
"moved": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"previousOwner":{
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
}
}
},
"owner":{
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
}
}
}
}
},
"updated": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
},
"previousElasticId":{
"type": "keyword"
}
}
},
"deleted": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"previousElasticId":{
"type": "keyword"
}
}
},
"_creator": {
"type": "keyword"
},
"_created": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
}
}
}
}
Nested objects that are always one level deep.
Now, my question is given this architecture what is the best way to deal with these large documents (because we have to store them in ES) ie breaking it up, etc? What pros and cons would there be if I was able to get rid of the nested objects?
There's not a clear way to do this in ES's documentation and I think that's because each case is different. What do you guys think is a practical architecture for this problem?


(Mark Walkom) #2

Do you mean it's not meant to? Because it's not really designed for that.

Breaking it up might be the best option, perhaps by the action - ie moved, updated etc.


(Lm) #3

Yes I meant "not", what a typo! So, breaking it up is the best approach for large documents?


(Mark Walkom) #4

Yep.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.