Dealing with many unique fields

itamarkr · January 26, 2019, 9:42am

Hi,

I have a system that writes it's logs to elastic for a while now.
We have a feature in the system that allows the user to enter tags.
The tags can have any name and any value (both are strings).
We want to start and log these tags so they can be visualized and searched.

But, we encountered a problem while trying to do so.
The first naive approach we had, was to index any new tag in its own field.
{
"tag_a":"val_a",
"tag_b":"val_b"
.....
}
This approach has its issues of course as we passed the maximum allowed fields.
We thought of upping it up but we soon passed our new limit. We currently have more than 4000 unique tags.

So, our instinct was to redesign our logs and write them as objects like this:

{
"tags":[
{
"key":"tag_a",
"val":"val_a"
},
{
"key":"tag_b",
"val":"val_b"
}
]
}
This solves the issue of having too many fields. But, it has problems with search. As I can get this document when looking for one with "key" as tag_a and a "val" with val_b.

This is a problem for us so we naturally decided to look at nested documents which will solve our problems.

Unfortunately, Kibana does not support nested documents in Discover or Visualizations.
Kibana is our main UI and we don't wish to develop something else to enable the search of our nested data.

We'd like to hear of any other ways to index the data that we might have overlooked to enable our use case.

Thank you very much

abdon · January 27, 2019, 2:04pm

I'm not sure if this will support what you're trying to do in Kibana, but have you considered simply concatenating the key-value pairs as a single value? So your sample document becomes:

{
  "tags": [
    "tag_a-val_a",
    "tag_b-val_b"
  ]
}

This will allow you to simply search for documents that have a specific key-value combination (assuming tags has been mapped as a keyword field):

GET _search
{
  "query": {
    "match": {
      "tags": "tag_b-val_b"
    }
  }
}

itamarm10 · February 4, 2019, 7:46am

Thank you so much for your response! Seems like this could do the trick for the issues related to our previous implementation.
We had two additional issues that we would like to resolve, both involve processing numbers.

Every document contains a tag that represents an integer timespan, and we would like the ability to compare between timespans in Kibana by using a “range” bucket for example.
We would like the ability to process and compare between documents, how many values were logged with a specific “tag” (= count). In the key-value nested object approach, we thought of explicitly writing the number of “values” in the object in a numeric “count” field, but we didn’t advance with this because of Kibana’s nested limitation.
Perhaps you or someone else might have ideas of implementing this. Thank you so much for your help!

system · March 4, 2019, 7:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Handling tags in elasticsearch documents and kibana visualisations Elasticsearch	10	245	December 12, 2024
Understanding field limit across index pattern and solution to mapping explosion Elasticsearch	5	1685	September 23, 2019
Elasticsearch nested fields Elasticsearch	2	419	July 6, 2017
Indexing documents with nested fields Elasticsearch	4	415	March 15, 2019
Searching nested objects in kibana 6.1.1 Kibana kql-kibana-query-language	2	1142	November 26, 2020

Dealing with many unique fields

Related topics