Graph Limitations with Nested Datatypes

graph

(Mitchell Waldman) #1

Hello,

I read in one of the earlier posts that graph is capable of querying nested datatypes from the seed query, but it is unable to network outwards using nested datatypes. I took Mark's advice from that post and used copy_to to bring my nested data into the root of the document, and it worked like a charm. My question is will future versions of the graph plugin bring in the ability to network outwards via nested data? If it isn't on the roadmap, why not?

Cheers,
Mitch


(Mark Harwood) #2

Performance I don't have particular concerns about because "nested" is built for speed but the hardest challenge is expanding the query DSL and UI configurations in ways that doesn't confuse the hell out of people :slight_smile:
If links are drawn by co-occurrence of terms in a doc what does that mean when we have nested and root docs (or perhaps multiple levels of nesting?). Can a term in the root doc be linked to a term in a nested doc or are we only looking to draw out links between 2 terms in the same nested doc? Do we link with sibling nested docs under the same parent? Etc

We could remove some of the questions and configuration choices if we cater for only one scenario but I'm not sure there is a common scenario emerging. Perhaps we can start with some more detail on your use case?

Cheers
Mark


(Mitchell Waldman) #3

Thanks for the response, Mark!

Well, IMO, the "Advanced Mode" UI is suggesting the former. The advanced mode allows you to pick any field you would like to be graphed (nested or not). So imagine the graph UI in this state:

  • I have a node selected that represents a root-level field
  • I have several nested fields toggled on in the '"Fields" bar up top

When I hit "+", it seems to me that that should translate to, "Show me the significant connections between this root level field, and the nested fields I have toggled on." But, unfortunately, nothing actually happens when you hit "+".

Probably my understanding of the problem is a bit simplistic, so forgive me if I'm far off target, but I imagine that in such a case like the one I just mentioned, the graph should absolutely draw connections between the root level doc and the nested doc.

We have highly structured and nested documents in our ES cluster at work relating to pulled news articles, tweets, and such. So for instance, a graphical use case I'd like to see manifested would be, "Given articles that mention 'Obama', what are the most prominent key phrases or concepts that relate to him. Or who else is commonly mentioned alongside Obama?" After taking your advice from that earlier post, and reindexing some of the nested details into the root of the doc, I was able to explore those kinds of relationships.

Thanks for being so on top of these questions!

Cheers,
Mitch


(Mark Harwood) #4

I[quote="mitchswaldman, post:3, topic:53067"]
After taking your advice from that earlier post, and reindexing some of the nested details into the root of the doc, I was able to explore those kinds of relationships
[/quote]

The "copy_to" technique is also useful in simplifying other forms of content. Field names in source data often encapsulate a role e.g. from and to on emails. It can be useful to copy values from these fields to a common "emailAddresses" field for the purposes of graph analysis otherwise to:foo@bar.com and cc:foo@bar.com would appear as two different nodes in the graph.

At some point we may add extra configuration that would allow you to define concepts/entities e.g. "EmailAddress" and then describe how that is physically mapped to one or more fields in one or more (potentially nested?) objects in one or more indices.

For the moment we have chosen to keep things simple and based our linking only on values in root-level docs in the same index.


(Mark Harwood) #5

I have a sense that this won't always be the case. After all, nested docs were invented specifically to prevent the muddling of terms from different entities. Take the example of a (root) person and their (nested) flight histories. The from/to pairings in each flight shouldn't be jumbled such that we suggest the many people who depart "London Heathrow" also tend to have "London Heathrow" as a destination.


(system) #6