Best practice to search fields not available to most documents

I'm new to the ELK stack. So far, so good. I am making progress, We're looking to use on an existing system get some real-time (or at least recent history) of client/user activity to aid in offering the most relevant services.

Each document/record has a well defined structure and there's common fields for all of them. However, each document also has a generic "data" field that will have information specific for the event type. I'm looking to use a few of these for my searches.

I'm concerned that I will lose efficiency if I try to use wildcards to scan this generic data field.

I was looking to regex/parse this to a specific new field as we index, but most records don't have this exact data, so that also seems inefficient. There can be thousands of documents for each client session, and there are thousands of sessions each day.

Now I'm thinking that the best way may be to simply have another index for the same data source, but only take the exact event types that will always have extra data and parse to another field (only ingest if eventType is "Session_Start", or another if event type is the results of some back-end service call). This other index will still have the other common fields, so I'm thinking I can link results together and do this was as many indexes as I need for each special event along with the full index.

Is this the right approach? Or, should i instead plan to expand the fields in the full index, even though most additional fields will have null data as only a handful of event types will have these new fields populated.
I'm concerned that using multiple indexes may introduce other problems I haven't considered.

Thanks

Welcome to our community! :smiley:

Wildcards in Elasticsearch can be inefficient, especially leading ones. Extracting the data is a good idea, and having some documents with no data in them isn't a concern, so I would start with that approach.

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.