Dealing with large documents (architecture question)

lauraerinmann · August 25, 2017, 6:05pm

Hi,
ElasticSearch's documentation makes it very clear that it's meant to index large documents. I have a document that when in the moderate case is about 15MB and millions of objects. The mapping looks like this:
{
"commit": {
"properties": {
"added": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
}
}
},
"moved": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"previousOwner":{
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
}
}
},
"owner":{
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
}
}
}
}
},
"updated": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"elasticId":{
"type": "keyword"
},
"previousElasticId":{
"type": "keyword"
}
}
},
"deleted": {
"type": "nested",
"properties" : {
"id":{
"type": "keyword"
},
"previousElasticId":{
"type": "keyword"
}
}
},
"_creator": {
"type": "keyword"
},
"_created": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
}
}
}
}
Nested objects that are always one level deep.
Now, my question is given this architecture what is the best way to deal with these large documents (because we have to store them in ES) ie breaking it up, etc? What pros and cons would there be if I was able to get rid of the nested objects?
There's not a clear way to do this in ES's documentation and I think that's because each case is different. What do you guys think is a practical architecture for this problem?

warkolm · August 26, 2017, 9:58am

Do you mean it's not meant to? Because it's not really designed for that.

Breaking it up might be the best option, perhaps by the action - ie moved, updated etc.

lauraerinmann · August 28, 2017, 5:19pm

Yes I meant "not", what a typo! So, breaking it up is the best approach for large documents?

warkolm · August 28, 2017, 9:39pm

Yep.

system · September 25, 2017, 9:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Optimizing indexing of documents containing large number of objects Elasticsearch	1	348	July 6, 2017
Need some help / idea about architecture Elasticsearch	4	352	July 6, 2017
How to cope with huge nested fields? Elasticsearch	1	425	November 8, 2018
Large index design question Elasticsearch	7	425	July 6, 2017
Indexing very large document in ES Elasticsearch	6	9547	July 6, 2017

Dealing with large documents (architecture question)

Related topics