I saw the ES explenation to do "JOIN" in ES, but I am confused and I think I have a complicated case, so will be happy for more help.
First I will lay out my problem:
I have about 5 logs that represent 5 stages in a process (for SQL this is like having 5 tables). In all the logs I have the same UID. In all the logs I have the parameters but the parameters that irrelevant for stage one is or "0" or an empty String.
Ok, well the only way to use the joins that link mentions is to restructure your documents so that it's all based around the UID. We refer to that as entity centric indexing.
Otherwise you can use something like the aggregation filter in Logstash, to group things before they are sent to Elasticsearch.
There's no way to do that sort of join with your data currently looking like that and in Elasticsearch.
The problem with the Logstash solution is that it erases all the other logs, this might be ok from my side but might be an issue. does "entity centric indexing" give the same problem?
Or alternatively is there a way to keep the 5 logs that are created and that the aggregation will only add an extra log?
You can use the Split filter in Logstash to have the originals but also an aggregated one. Entity centric indexing will also mean you don't maintain the original logs.
if you want both the original and the aggregated one, then you will need two copies of the data in Elasticsearch.
Ok, to make sure I understood:
So, basiclly my machines will send to LS the 5 logs and in LS I will have a split filter that will duplicate these logs. and on the duplicated part I will use the aggregation?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.