Huge fields - Mapping best practice

stevesimpson · June 15, 2022, 11:45am

Hey all

I'm wondering what the best practice and recommendation would be for handling huge fields at scale? The situation I have is that I need to include an XML document (which is almost always massive) in the Kibana discover app when users submit searches. This XML document does not need to be searchable or indexed or anything, just needs to be viewable and ideally included in reports.

I’ve tried setting the mapping to xml_field: {enabled: false, type: object} which does not analyse and index the field but query performance is awful and will not scale. I am assuming that this is because the field is still included in _source?

If I add a source filter in Index Management then query performance is great again but I cannot view the field

Feels like I'm a bit stuck between a rock and a hard place here, having the XML response viewable as part of the search results is vital for our debugging needs and it's not possible to parse the XML contents before indexing using Logstash etc as the format is pretty unstructured.

Would love to get some advice and discussion around what steps I might be able to take to resolve this, I've asked this in the slack group but thought this might be a better forum.

intrepid1 · June 15, 2022, 11:50am

For our massive fields we use the wildcard type.

You might find this useful to read.

Find strings within strings faster with the Elasticsearch wildcard field | Elastic Blog

stevesimpson · June 15, 2022, 12:19pm

Thanks for the reply @intrepid1

Wouldn't a wildcard type make the problem worse? The field is currently set to enabled: false

The enabled setting, which can be applied only to the top-level mapping definition and to object fields, causes Elasticsearch to skip parsing of the contents of the field entirely. The JSON can still be retrieved from the _source field, but it is not searchable or stored in any other way

RabBit_BR · June 15, 2022, 2:12pm

Hi @stevesimpson

Maybe mapping store make sense for you.

warkolm · June 21, 2022, 1:57am

How huge is huge.

stevesimpson · June 21, 2022, 10:06am

Hey @warkolm, typically around 50-100kb however they can be around1-2mb in size.

It's a non-typical use case for Elastic I know and not really what it's for, however, having the XML file as part of the event is vital for debugging. It's difficult to fully parse the XML on ingest because they can be unstructured, so XPath, Grok Logstash plugins can be prone to error.

I do think I have a workable solution though which seems to be reasonably performant, and this is actually using the source exclusions and setting the field to be mapped as a non-enabled object type.

We can still retrieve the field from Kibana discover using the "Show single document" button. It's not ideal as I can only look at a single XML doc at a time and we can't use reporting to export the field but it's better than nothing.

If you have any other thoughts or ideas I'm all ears

Tomo_M · June 21, 2022, 1:08pm

query performance is awful

How awful is it? What is the query and how many it hits? Isn't it just take time to receive some mb messages?

system · July 19, 2022, 1:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bad performance with large text field Elasticsearch	2	786	August 3, 2018
Handling of large fields (strings or attachments) in Kibana Kibana	6	1204	July 6, 2017
Large string fields Elasticsearch	6	4744	February 15, 2017
Extremely Large Documents: Querying and Dealing with Elasticsearch	17	3023	October 28, 2021
Mapper-Size Query Kibana	13	2949	July 6, 2017

Huge fields - Mapping best practice

Related topics