How best create parent child relationship

emanresu · July 25, 2018, 3:51pm

I have two real time streams. One contains news articles and the other comments about the same articles. I'd like to create a parent-child relationship between each article and that articles comments. There is no common id. I'd like to use the headline which exists in both streams and match the two streams based on that every 15 minutes. I am assuming that 15 min would be sufficient to handle delay between the two streams. How would you go about doing this? Any ideas would be appreciated.

A typical message containing, entity_name, source_name, headline, which comes through Logstash looks like this:

"Thomson Reuters Corp.","Japan Today","Trump claims victory after forcing NATO crisis talks"

Some typical comments, comment, headline, which comes through Logstash but a separate pipeline looks like this:

"We applaud Trumps claim ...", "Trump claims victory after forcing NATO crisis talks"

"Nato crisis is important...", "Trump claims victory after forcing NATO crisis talks"

Specifically:

Keep indexes separate or create a third index with from the first two?
How to run 15 min refresh cycles?
How can I create a hash of the headline conveniently, to use for matching the two pieces?
If there is a better way/tool/data store, please advise.

Update: There seems to be a big pivot from de-normalized to normalized data structure explained in Removal of Types: [(https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html)]

So my guess is that the answer to 1 is to keep the articles and comments in separate indexes.

system · August 22, 2018, 3:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index relationship between distinct csv files Elasticsearch	4	606	April 29, 2017
Parent/child relationships Elasticsearch	3	348	July 6, 2017
Handling related logs Logstash	4	523	December 4, 2017
Define Parent-Child Relationship in Logstash Logstash	3	5227	July 6, 2017
Rolling index with parent/child? Elasticsearch	5	650	July 5, 2017

How best create parent child relationship

Related topics