I have been asked to determine how we can leverage a set of tags that will be added to all documents being indexed in future and I am looking for guidance on best practices.
Our clusters are primarily used as analytics stores and we must be able to filter the dashboard in a use friendly way by the tag values (ideally with a control filter with parent for tag value being the tag key).
The tags will look like the below with a key and value pair, I have looked at indexing these as nested, flattened and objects but nested and
so far the best compromise I can get to is to concatenate the key value pair eg "location-Ireland","location-England","lifecycle-to_be_sunset"
then my next question is whether its better to index as "tags": ["location-Ireland","location-England","lifecycle-to_be_sunset"],
or "tags": [{"value":"location-Ireland"},{"value":"location-England"},{"value":"lifecycle-to_be_sunset"}],
I'm open to all feedback and suggestions but key is to have it usable via Kibana for filtering.
tags
List of keywords used to tag each event.
type: keyword
Note: this field should contain an array of values.
example: ["production", "env2"]
By using tags as a key-value store, you'll be in conflict with the schema and will have a hard time using elastic-native data sources.
Instead of using tags, I would recommend creating a normal object and using your keys like location as a key and your key like lifecycle as a real key. You would then add this to each object under something like a MyOrg key. So to each document you would add:
This will make each one of these a full field in Kibana that enables filtering, sorting, dashboard controls, and more. You will also avoid some of the headaches that come with dealing with multi-value types when you start using these in KQL and ES|QL!
Kibana has traditionally not had great support for nested documents (not sure if this has changed recently) so having an array with key-value objects is likely not going to work well. If there are a limited number of possible keys I would probably nest these:
"tags": {
"location": ["England","Ireland"]
}
If there are a large number of possibilities your example where you concatenated key and value is IMHO probably the best.
If you are using tags for filtering and not aggregations (?) it may be worthwhile looking at using the nested example I showed together with the flattened field type. Am not sure how well this does or does not work with Kibana though.
The problem is that the business teams want to maintain that key value pair relationship and the value alone can be interpreted differently based on the key (given context)
Given all the options above and having tested them and compared to the potential volume of keys, risk of mapping explosion etc The only option that seems to work for me is when I concatenate the key and value pair:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.