[Resolved]Join data cross two index based on common field

Hi Team,

In my local elk stack, below two index are created.

  1. kvaudit*
    • This one is index customer behavior data. Example data:

module=SCM fa=TS at=SCM.TS.MODIFY_SEARCH si=4C3D8709E51DDC4EE879A9E30729B512.mo-5692ea7ca ci=SCMStella cn=SCMStella cs=qacandrot_SCMStella. pi=dbPool1 ui=cgrant1 locale=en_US ktf1=[C,E,X,H,M]


  1. testcase*
    • This one is index test case. Example data:

caseid, classname,features,at,author,module

There is a common fields call 'at' in both index.

My requirement is to join the data of these 2 index based on at field to setup the connection between customer behavior data and test case, then view in kibana.

Is it achievable and any solution?

See here for creating a third index from the union of two existing indices.

Hi Mark,

I tried, but the result is not what I expected. In the third index, it just append the data of second index into the first index, instead of join based on common field like we join two tables in SQL.

Sounds like a bug in your client code.
The pseudo code should be:

Issue search on srcIndex1,srcIndex2 sorted by id
for all results
If current doc id== last doc id
add current fields to last doc
write last doc to new index
last doc = current doc

Hi Mark,

I am not very understood the logic of the pseudo code here and a bit confused. At the beginning, I thought that create a 3rd index is enough, and no need to anything.

  1. The pseudo code means data processing code in the logstash configuration file?
  2. And both 2 index has no id field
  3. the relationship between two indexes is m:m, instead of 1:1. that is 1 behavior event could map to multiple test case events, and 1 test case event could map to multiple behavior events. Is this can be doable as well?


Hi Cherie

No, it means code as in Python, Perl, Java or whatever is your preferred programming language. I’m not sure this is something Logstash can do but should be a simple python script for example.

I believe you called it ‘at’?

The logic in my pseudo code should cater for that.

Hi Mark,

Let me double confirm your solution.
Do you mean that i export the data from two index, and use any programming language like python to join these data based on 'at', then I have a file unioned behavior data and test case, then use logstash to parse the unioned file and ingest the data into elasiticseach? hmmm.. if so, why need to create the 3rd index to union two parts of data? because the 'join' already completed by python at the beginning.


No, the Python script can write directly to your new index (using the ‘bulk’ api) rather than writing to a file.
Logstash is not required in this scenario - it’s normally used as a way of avoiding programming.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.