Hi, I am using ES to archive data originally stored in a relational DB, and then provide a Kibana Dashboard. I store "denormalized" data (that is, performing joins at upload time), which of course comes with some caveats.
One of them: for visualization and filtering purposes, data should be aggregated based on ids, but displayed based on labels. This mapping from ids to labels may change with time, and I would always want to use the current map.
In order to avoid having to update previously stored denormalized data (which in my opinion is an awful scenario), I am thinking of controlling the mapping with a scripted field like the one below:
def l = new ArrayList();
Map map = [
'id1':'label1',
'id2':'label2'
];
def k = doc['myfield.keyword'].value;
if (!map.containsKey(k)) {
return 'no_label';
} else {
return map[k];
}
I can produced and update the scripted dynamically with the saved_objects Kibana API.
However, this solution looks like a weak or problematic workaround to me. I suppose that, if at all, I could use an approach like this, with some safety, given that the map is small and very stable: on the order of hundred(s) entries and maybe changed one or twice a month)
The mapping is taking place in the browser client and are saved in the index pattern. So when Kibana is showing a visualization, it will download the index pattern saved object containing the whole mapping and apply it before displaying.
Performance-wise I think ~5k pairs are still manageable, you probably have to increase the server.maxPayloadBytes setting in kibana.yml to be able to save the index pattern. If this number grows to 100k, we should think about a solution within Elasticsearch
the solution with the scripted field dynamically updated with, say, with a scheduler, will perform the mapping within the ES server, right? With this approach would you see still a problem working with 5k value pairs? or 50k value pairs? I feels a bit hacky to me: that will be a very long script...
In your answer, when you refer to 100k, is it key-value pairs or server.maxPayloadBytes?
the solution with the scripted field dynamically updated with, say, with a scheduler, will perform the mapping within the ES server, right? With this approach would you see still a problem working with 5k value pairs? or 50k value pairs? I feels a bit hacky to me: that will be a very long script...
I wouldn't recommend the script approach because the script is not persisted within Elasticsearch, but sent to the server with each individual request made from Kibana. For a lot of key-value pairs this would impact performance a lot because it has to upload possibly megabytes of key-value pairs for each request each visualization is issuing. Field formatters seem like the best option for your use case.
In your answer, when you refer to 100k, is it key-value pairs or server.maxPayloadBytes ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.