[Resolved]Join data cross two index based on common field

cheriemilk · August 16, 2019, 8:49am

Hi Team,

In my local elk stack, below two index are created.

kvaudit*
- This one is index customer behavior data. Example data:
  `

module=SCM fa=TS at=SCM.TS.MODIFY_SEARCH si=4C3D8709E51DDC4EE879A9E30729B512.mo-5692ea7ca ci=SCMStella cn=SCMStella cs=qacandrot_SCMStella. pi=dbPool1 ui=cgrant1 locale=en_US ktf1=[C,E,X,H,M]

`

testcase*
- This one is index test case. Example data:

caseid, classname,features,at,author,module
ENT200021808,Verifythatmodifysearchbuttonworksasexpected,TS,SCM.TS.MODIFY_SEARCH,I348636,SCM

There is a common fields call 'at' in both index.

My requirement is to join the data of these 2 index based on at field to setup the connection between customer behavior data and test case, then view in kibana.

Is it achievable and any solution?

Mark_Harwood · August 16, 2019, 4:22pm

See here for creating a third index from the union of two existing indices.

cheriemilk · August 19, 2019, 3:21am

Hi Mark,

I tried, but the result is not what I expected. In the third index, it just append the data of second index into the first index, instead of join based on common field like we join two tables in SQL.

Mark_Harwood · August 19, 2019, 6:31am

Sounds like a bug in your client code.
The pseudo code should be:

Issue search on srcIndex1,srcIndex2 sorted by id
for all results
If current doc id== last doc id
add current fields to last doc
else
write last doc to new index
last doc = current doc

cheriemilk · August 20, 2019, 2:48am

Hi Mark,

I am not very understood the logic of the pseudo code here and a bit confused. At the beginning, I thought that create a 3rd index is enough, and no need to anything.

The pseudo code means data processing code in the logstash configuration file?
And both 2 index has no id field
the relationship between two indexes is m:m, instead of 1:1. that is 1 behavior event could map to multiple test case events, and 1 test case event could map to multiple behavior events. Is this can be doable as well?

Thanks,
Cherie

Mark_Harwood · August 20, 2019, 6:44am

Hi Cherie

No, it means code as in Python, Perl, Java or whatever is your preferred programming language. I’m not sure this is something Logstash can do but should be a simple python script for example.

I believe you called it ‘at’?

The logic in my pseudo code should cater for that.

cheriemilk · August 22, 2019, 2:41am

Hi Mark,

Let me double confirm your solution.
Do you mean that i export the data from two index, and use any programming language like python to join these data based on 'at', then I have a file unioned behavior data and test case, then use logstash to parse the unioned file and ingest the data into elasiticseach? hmmm.. if so, why need to create the 3rd index to union two parts of data? because the 'join' already completed by python at the beginning.

Thanks,
Cherie

Mark_Harwood · August 22, 2019, 6:59am

No, the Python script can write directly to your new index (using the ‘bulk’ api) rather than writing to a file.
Logstash is not required in this scenario - it’s normally used as a way of avoiding programming.

system · September 30, 2019, 6:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Joining Two Indexes with common field values Elasticsearch	9	3836	June 8, 2023
Feasibility in joining two indexes in elasticsearch Elasticsearch	5	353	October 14, 2020
Merge 2 types based on common field Elasticsearch	1	338	July 6, 2017
Merge 2 types based on common field Elasticsearch	3	883	July 6, 2017
How do I combine 2 index data in elasticsearch? Elasticsearch	22	13414	October 10, 2019

[Resolved]Join data cross two index based on common field

Related topics