Extract edges between terms of a field based on another field[s]

(Milad Vadoodparast) #1

I illustrate my question with an example. Assume having a type named "post". Each post has a userid field which indicates its author. This is how the post is stored:

"_index": "test",
"_type": "post",
"_id": "10098259467307",
"_score": 1,
"_source": {
"text": "user 1 message",
"userid": 1,
"id": 10098259467307,
Is there a way to extract the relationship between the users based on the co-occurrence of terms within the "text" field of posts which they have authored? For example, if there are two posts containing the word "elastic", what I would like to see is an edge between the posts' authors.
StackOverflow Link

(Mark Harwood) #2

This demo is based on stack overflow posts and graphs the tags and people involved in them: https://m.youtube.com/watch?v=1QwmJ_FCMqU&feature=youtu.be

(Milad Vadoodparast) #3

Thanks @Mark_Harwood
I watched the video but my question still remains.
In the stack overflow case for example, is this hypothetical graph achievable?
"A graph where the vertices stand for (just) the user names and a relationship between any two nodes summarizes the number of common tags they have contributed to"

(Mark Harwood) #4

I watched the video but my question still remains

Hopefully the video demonstrated that there's often a need to cherry-pick which terms you use as the glue between people - you typically shouldn't use them all. In the StackOverflow data, tags like Javascript and Java are super-common and are not a useful basis on which to link people. That is why I advocated a solution where only selected tags are made visible in the graph and are used to explain the connection between people.

Let's assume then that all tags are equally interesting so you want to remove them from being visible in your graph in the way you describe. The important points to remember is that in elastic Graph the vertices are always terms and the edges represent one or more docs. Clicking on an edge will show how many docs a pair of terms have in common. To present the model you describe you'd need tag-centric docs e.g.

POST test/tag
	"tag": "elasticsearch",
	"contributors":["Shay", "Uri"]
POST test/tag
	"tag": "kibana",
	"contributors":["Rashid", "Uri"]

There are a few of concerns with this approach:

1) You probably need to write code to re-orient your source data in this fashion (see "entity centric indexing")
2) Some tags e.g. Java or Javascript could have prohibitively large lists of contributors
3) The common-place (boring) tags like "Java" or "Javascript" will link everyone together in the diagram
4) The interesting (rare) tags will not be emphasised (2 people sharing a rare topic counts for the same as 2 people sharing a common one)
5) The specialisms a person has (repeated posts tagged with neo4j) will not be emphasised in this model.

So you can do it this way, but I'm not sure it makes sense.

(Milad Vadoodparast) #5

I get it now
Many thanks

(system) #6