Best practice to search fields not available to most documents

andrew.laraia · September 17, 2020, 9:26pm

I'm new to the ELK stack. So far, so good. I am making progress, We're looking to use on an existing system get some real-time (or at least recent history) of client/user activity to aid in offering the most relevant services.

Each document/record has a well defined structure and there's common fields for all of them. However, each document also has a generic "data" field that will have information specific for the event type. I'm looking to use a few of these for my searches.

I'm concerned that I will lose efficiency if I try to use wildcards to scan this generic data field.

I was looking to regex/parse this to a specific new field as we index, but most records don't have this exact data, so that also seems inefficient. There can be thousands of documents for each client session, and there are thousands of sessions each day.

Now I'm thinking that the best way may be to simply have another index for the same data source, but only take the exact event types that will always have extra data and parse to another field (only ingest if eventType is "Session_Start", or another if event type is the results of some back-end service call). This other index will still have the other common fields, so I'm thinking I can link results together and do this was as many indexes as I need for each special event along with the full index.

Is this the right approach? Or, should i instead plan to expand the fields in the full index, even though most additional fields will have null data as only a handful of event types will have these new fields populated.
I'm concerned that using multiple indexes may introduce other problems I haven't considered.

Thanks

warkolm · September 17, 2020, 10:11pm

Welcome to our community!

Wildcards in Elasticsearch can be inefficient, especially leading ones. Extracting the data is a good idea, and having some documents with no data in them isn't a concern, so I would start with that approach.

andrew.laraia · September 18, 2020, 3:12pm

Thank you

system · October 16, 2020, 3:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Should I index log data with multifield mappings on string types? Elasticsearch	1	299	December 30, 2016
Has anyone improvised a solution for post-indexing searching? (Splunk-alike field extraction) Elasticsearch	1	509	July 5, 2017
Extracting fields in bulk - using ES as a data store Elasticsearch	4	550	July 6, 2017
Project advice (mapping, analysis, basic architecture ) Elasticsearch	1	409	April 23, 2017
Performance of doc_values field vs analysed field Elasticsearch	4	1651	October 18, 2017

Best practice to search fields not available to most documents

Related topics