My requirement is to join the data of these 2 index based on at field to setup the connection between customer behavior data and test case, then view in kibana.
I tried, but the result is not what I expected. In the third index, it just append the data of second index into the first index, instead of join based on common field like we join two tables in SQL.
Sounds like a bug in your client code.
The pseudo code should be:
Issue search on srcIndex1,srcIndex2 sorted by id
for all results
If current doc id== last doc id
add current fields to last doc
else
write last doc to new index
last doc = current doc
I am not very understood the logic of the pseudo code here and a bit confused. At the beginning, I thought that create a 3rd index is enough, and no need to anything.
The pseudo code means data processing code in the logstash configuration file?
And both 2 index has no id field
the relationship between two indexes is m:m, instead of 1:1. that is 1 behavior event could map to multiple test case events, and 1 test case event could map to multiple behavior events. Is this can be doable as well?
No, it means code as in Python, Perl, Java or whatever is your preferred programming language. I’m not sure this is something Logstash can do but should be a simple python script for example.
I believe you called it ‘at’?
The logic in my pseudo code should cater for that.
Let me double confirm your solution.
Do you mean that i export the data from two index, and use any programming language like python to join these data based on 'at', then I have a file unioned behavior data and test case, then use logstash to parse the unioned file and ingest the data into elasiticseach? hmmm.. if so, why need to create the 3rd index to union two parts of data? because the 'join' already completed by python at the beginning.
No, the Python script can write directly to your new index (using the ‘bulk’ api) rather than writing to a file.
Logstash is not required in this scenario - it’s normally used as a way of avoiding programming.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.