I'm wondering what the best practice and recommendation would be for handling huge fields at scale? The situation I have is that I need to include an XML document (which is almost always massive) in the Kibana discover app when users submit searches. This XML document does not need to be searchable or indexed or anything, just needs to be viewable and ideally included in reports.
I’ve tried setting the mapping to xml_field: {enabled: false, type: object} which does not analyse and index the field but query performance is awful and will not scale. I am assuming that this is because the field is still included in _source?
If I add a source filter in Index Management then query performance is great again but I cannot view the field
Feels like I'm a bit stuck between a rock and a hard place here, having the XML response viewable as part of the search results is vital for our debugging needs and it's not possible to parse the XML contents before indexing using Logstash etc as the format is pretty unstructured.
Would love to get some advice and discussion around what steps I might be able to take to resolve this, I've asked this in the slack group but thought this might be a better forum.
Wouldn't a wildcard type make the problem worse? The field is currently set to enabled: false
The enabled setting, which can be applied only to the top-level mapping definition and to object fields, causes Elasticsearch to skip parsing of the contents of the field entirely. The JSON can still be retrieved from the _source field, but it is not searchable or stored in any other way
Hey @warkolm, typically around 50-100kb however they can be around1-2mb in size.
It's a non-typical use case for Elastic I know and not really what it's for, however, having the XML file as part of the event is vital for debugging. It's difficult to fully parse the XML on ingest because they can be unstructured, so XPath, Grok Logstash plugins can be prone to error.
I do think I have a workable solution though which seems to be reasonably performant, and this is actually using the source exclusions and setting the field to be mapped as a non-enabled object type.
We can still retrieve the field from Kibana discover using the "Show single document" button. It's not ideal as I can only look at a single XML doc at a time and we can't use reporting to export the field but it's better than nothing.
If you have any other thoughts or ideas I'm all ears
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.