Huge fields - Mapping best practice

Hey all :wave:

I'm wondering what the best practice and recommendation would be for handling huge fields at scale? The situation I have is that I need to include an XML document (which is almost always massive) in the Kibana discover app when users submit searches. This XML document does not need to be searchable or indexed or anything, just needs to be viewable and ideally included in reports.

I’ve tried setting the mapping to xml_field: {enabled: false, type: object} which does not analyse and index the field but query performance is awful and will not scale. I am assuming that this is because the field is still included in _source?

If I add a source filter in Index Management then query performance is great again but I cannot view the field :upside_down_face:

Feels like I'm a bit stuck between a rock and a hard place here, having the XML response viewable as part of the search results is vital for our debugging needs and it's not possible to parse the XML contents before indexing using Logstash etc as the format is pretty unstructured.

Would love to get some advice and discussion around what steps I might be able to take to resolve this, I've asked this in the slack group but thought this might be a better forum.

For our massive fields we use the wildcard type.

You might find this useful to read.

Find strings within strings faster with the Elasticsearch wildcard field | Elastic Blog

Thanks for the reply @intrepid1

Wouldn't a wildcard type make the problem worse? The field is currently set to enabled: false

The enabled setting, which can be applied only to the top-level mapping definition and to object fields, causes Elasticsearch to skip parsing of the contents of the field entirely. The JSON can still be retrieved from the _source field, but it is not searchable or stored in any other way

Hi @stevesimpson

Maybe mapping store make sense for you.

How huge is huge.

Hey @warkolm, typically around 50-100kb however they can be around1-2mb in size.

It's a non-typical use case for Elastic I know and not really what it's for, however, having the XML file as part of the event is vital for debugging. It's difficult to fully parse the XML on ingest because they can be unstructured, so XPath, Grok Logstash plugins can be prone to error.

I do think I have a workable solution though which seems to be reasonably performant, and this is actually using the source exclusions and setting the field to be mapped as a non-enabled object type.

We can still retrieve the field from Kibana discover using the "Show single document" button. It's not ideal as I can only look at a single XML doc at a time and we can't use reporting to export the field but it's better than nothing.

If you have any other thoughts or ideas I'm all ears :slight_smile:

query performance is awful

How awful is it? What is the query and how many it hits? Isn't it just take time to receive some mb messages?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.