Recommendation about storing too large XML as attachment to be searched


(Paco B) #1

Hello,

I would like to know your recomendation about storing too large (more than 16 MB) XML as attachment in Elastic Search using mapper attachment.

Is a good idea to store this kind of files in Elastic Search if we want to search by one of the internal tag via Elastic Search?

Is there any performance document about the mapper attachment and/or Elastic Search behavior with this kind of fields when storing and/or searching?

Many thanks!

Regards,
Paco.


(Mark Walkom) #2

What do you want to search on exactly?


(Paco B) #3

Thanks for your reply Mark,

This is the idea:

I have a MongoDB database with different collections, but one of them has a field with a XML value. This XML could be greater than 16 MB, so I created a Jira ticket in MongoDB support to ask how to resolve this problem. The recommendation was to use GridFS (MongoDB store mechanism in binary format) and ES as search engine over this field to find specific tag inside the value.

I used the "mongo-connector" plugin to have the MongoDB data in ES. I download the elasticsearch-mapper-attachments plugin to support the GridFS implementation, and I understood that ES, when receives this kind of data, could index it and search by whatever String content.

FYI - I opened another ticket in your Website but asking about a problem to do queries using the query engine over MongoDB GridFS in ES.

So, I would like to know when we have data loaded via MongoDB (using GridFS), if it is a good idea to do search over large XML.

Many thanks.

Regards,
Paco.


(Mark Walkom) #4

Given it's the same as Search - Attachment - Content, let's keep the conversation there :slight_smile:


(Mark Walkom) #5