Extract edges between terms of a field based on another field[s]

mvadood · August 7, 2016, 6:38am

I illustrate my question with an example. Assume having a type named "post". Each post has a userid field which indicates its author. This is how the post is stored:

{
"_index": "test",
"_type": "post",
"_id": "10098259467307",
"_score": 1,
"_source": {
"text": "user 1 message",
"userid": 1,
"id": 10098259467307,
}}
Is there a way to extract the relationship between the users based on the co-occurrence of terms within the "text" field of posts which they have authored? For example, if there are two posts containing the word "elastic", what I would like to see is an edge between the posts' authors.
StackOverflow Link

Mark_Harwood · August 7, 2016, 12:14pm

This demo is based on stack overflow posts and graphs the tags and people involved in them: https://m.youtube.com/watch?v=1QwmJ_FCMqU&feature=youtu.be

mvadood · August 10, 2016, 3:01pm

Thanks @Mark_Harwood
I watched the video but my question still remains.
In the stack overflow case for example, is this hypothetical graph achievable?
"A graph where the vertices stand for (just) the user names and a relationship between any two nodes summarizes the number of common tags they have contributed to"

Mark_Harwood · August 10, 2016, 3:31pm

I watched the video but my question still remains

Hopefully the video demonstrated that there's often a need to cherry-pick which terms you use as the glue between people - you typically shouldn't use them all. In the StackOverflow data, tags like Javascript and Java are super-common and are not a useful basis on which to link people. That is why I advocated a solution where only selected tags are made visible in the graph and are used to explain the connection between people.

Let's assume then that all tags are equally interesting so you want to remove them from being visible in your graph in the way you describe. The important points to remember is that in elastic Graph the vertices are always terms and the edges represent one or more docs. Clicking on an edge will show how many docs a pair of terms have in common. To present the model you describe you'd need tag-centric docs e.g.

POST test/tag
{
	"tag": "elasticsearch",
	"contributors":["Shay", "Uri"]
}
POST test/tag
{
	"tag": "kibana",
	"contributors":["Rashid", "Uri"]
}

There are a few of concerns with this approach:

You probably need to write code to re-orient your source data in this fashion (see "entity centric indexing")
Some tags e.g. Java or Javascript could have prohibitively large lists of contributors
The common-place (boring) tags like "Java" or "Javascript" will link everyone together in the diagram
The interesting (rare) tags will not be emphasised (2 people sharing a rare topic counts for the same as 2 people sharing a common one)
The specialisms a person has (repeated posts tagged with neo4j) will not be emphasised in this model.

So you can do it this way, but I'm not sure it makes sense.

mvadood · August 10, 2016, 6:59pm

I get it now
Many thanks

Topic		Replies	Views
Relationships among entities with graph Kibana elastic-stack-graph	2	1627	July 6, 2017
Label on Edges? Kibana elastic-stack-graph	6	2976	July 6, 2017
Would it be possible for Graph to detect connections between document entities? Kibana elastic-stack-graph	2	1047	July 6, 2017
How to Use Graph Kibana elastic-stack-graph	7	1446	July 6, 2017
Graph relation extraction on lastfm data Kibana elastic-stack-graph	6	1891	December 7, 2016

Extract edges between terms of a field based on another field[s]

Related topics