I have a system that writes it's logs to elastic for a while now.
We have a feature in the system that allows the user to enter tags.
The tags can have any name and any value (both are strings).
We want to start and log these tags so they can be visualized and searched.
But, we encountered a problem while trying to do so.
The first naive approach we had, was to index any new tag in its own field.
{
"tag_a":"val_a",
"tag_b":"val_b"
.....
}
This approach has its issues of course as we passed the maximum allowed fields.
We thought of upping it up but we soon passed our new limit. We currently have more than 4000 unique tags.
So, our instinct was to redesign our logs and write them as objects like this:
{
"tags":[
{
"key":"tag_a",
"val":"val_a"
},
{
"key":"tag_b",
"val":"val_b"
}
]
}
This solves the issue of having too many fields. But, it has problems with search. As I can get this document when looking for one with "key" as tag_a and a "val" with val_b.
This is a problem for us so we naturally decided to look at nested documents which will solve our problems.
Unfortunately, Kibana does not support nested documents in Discover or Visualizations.
Kibana is our main UI and we don't wish to develop something else to enable the search of our nested data.
We'd like to hear of any other ways to index the data that we might have overlooked to enable our use case.
I'm not sure if this will support what you're trying to do in Kibana, but have you considered simply concatenating the key-value pairs as a single value? So your sample document becomes:
{
"tags": [
"tag_a-val_a",
"tag_b-val_b"
]
}
This will allow you to simply search for documents that have a specific key-value combination (assuming tags has been mapped as a keyword field):
Thank you so much for your response! Seems like this could do the trick for the issues related to our previous implementation.
We had two additional issues that we would like to resolve, both involve processing numbers.
Every document contains a tag that represents an integer timespan, and we would like the ability to compare between timespans in Kibana by using a “range” bucket for example.
We would like the ability to process and compare between documents, how many values were logged with a specific “tag” (= count). In the key-value nested object approach, we thought of explicitly writing the number of “values” in the object in a numeric “count” field, but we didn’t advance with this because of Kibana’s nested limitation.
Perhaps you or someone else might have ideas of implementing this. Thank you so much for your help!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.