Hi -
Perhaps this is answered somewhere else that I could not find, but is it possible to include labels on edges? Seeing a relationship/intensity of relationship between nodes is great, but having a label to identify what that particular relationship is would be even better, especially to see multiple types of relationships per graph
Thanks!
Tom
A single edge can represent many documents that associate two vertices. You can view examples of these docs by following these steps (the third step is using a field choice from my Panama papers example)
The current release of graph shows up to the first 10 docs that have vertex term A OR term B - the top-most docs typically being the high-scoring docs that contain both.
In the next release this will only show docs that contain pairs of selected terms.
In later releases we will offer drill-down to any choice of Kibana summary/visualization including time lines etc.
Cheers
Mark
Hi again -
Thanks for the quick response!
This is helpful and a good start for me but perhaps I'm missing the boat here (I usually assume that's the case... ) ... what I'm looking for is something that is like RDR, which, on the blazegraph pages defines it as ' RDR is a simple extension to RDF that allows you to add edge properties as well' (https://wiki.blazegraph.com/wiki/index.php/Concepts).
I realize this may be something that is do-able in code but not in Kibana. if that's the case can you please supply a URL for that? Part of my problem may be in translating terms from other graph databases to Elasticsearch's.
Thanks again!
T
No worries. Ours is generally based on using a different approach - a "bottom-up" rather than "top-down" means of associating data. Wisdom of crowds emergent structures Vs curated content.
Modelling your data in a traditional graph database is often an act of censorship. Concrete edges are only created for the relationships in the source data that are assumed ahead of time to be useful. Many of the relationships that exist in the source document are not modelled (for instance, theoretically every single word in an email could be connected to the author). However, a search index acts as a way of automatically maintaining the associations between all of the values in all of the fields contained in the same document. It also knows every other document that contains any of these terms and maintains the frequency of every value - every IP address, word, number etc. Using this fully connected set of values we can cherry pick the connections using statistical approaches at query-time in order to build a graph on-the-fly of only the meaningfully associated values. The documents act as the glue that strengthen connections between terms.
These are a form of "emergent graph". One example is from the Enron emails and a search for project "Jedi" - it produced a connection to a 7 digit number that was found in the text used in several emails. It was the bank account number used for this off-balance-sheet company and of course much more significant than the rest of the everyday English terms used in these emails which we did not return. This is an example of "bottom up" connections.
Of course you can also use the Graph api to follow simple "edge" connections recorded as "A knows B" style documents but this is a subset of the features we employ for summarising richer documents. Right now we don't have special UI logic for the special case where one edge = one document but we may look at adding this in future.
Cheers
Mark
Hmmm... Interesting.
Last (stupid) question: between your reply below and the Kibana
screen cap sent in your first message it seems, then, that there would
be a list of relations that could be reviewed and chosen based on
weight, fields, etc etc. Is that the case? eg 85% relevance
between node1:somefield and node2:someOtherfield. That would be
pretty cool.
Sorry about all these basic types of questions, I'm evaluating ESGraph
against Blazegraph, Orientdb and a couple others.
T
At Tuesday, 05/31/2016 on 11:42 am Mark Harwood wrote:
Mark_Harwood [1]
May 31
tfillmor:
Part of my problem may be in translating terms from other graph
databases to Elasticsearch's.
No worries. Ours is generally based on using a different approach - a
"bottom-up" rather than "top-down" means of associating data. Wisdom
of crowds emergent structures Vs curated content.
Modelling your data in a traditional graph database is often an act of
censorship. Concrete edges are only created for the relationships in
the source data that are assumed ahead of time to be useful. Many of
the relationships that exist in the source document are not modelled
(for instance, theoretically every single word in an email could be
connected to the author). However, a search index acts as a way of
automatically maintaining the associations between all of the values
in all of the fields contained in the same document. It also knows
every other document that contains any of these terms and maintains
the frequency of every value - every IP address, word, number etc.
Using this fully connected set of values we can cherry pick the
connections using statistical approaches at query-time in order to
build a graph on-the-fly of only the meaningfully associated values.
The documents act as the glue that strengthen connections between
terms.
These are a form of "emergent graph". One example is from the Enron
emails and a search for project "Jedi" - it produced a connection to a
7 digit number that was found in the text used in several emails. It
was the bank account number used for this off-balance-sheet company
and of course much more significant than the rest of the everyday
English terms used in these emails which we did not return. This is an
example of "bottom up" connections.
Of course you can also use the Graph api to follow simple "edge"
connections recorded as "A knows B" style documents but this is a
subset of the features we employ for summarising richer documents.
Right now we don't have special UI logic for the special case where
one edge = one document but we may look at adding this in future.
Cheers
Mark
Visit Topic [2] or reply to this email to respond
In Reply To
tfillmor [3] T Fillmore [3]
May 31
Hi again - Thanks for the quick response! This is helpful and a
good start for me but perhaps I'm missing the boat here (I usually
assume that's the case... [] ) ... what I'm looking for
is something that is like RDR, which, on the blazegraph pages defines
it as ' RDR is a simple …
Visit Topic [2] or reply to this email to respond
To stop receiving notifications for this particular topic, click here
[4]. To unsubscribe from these emails, change your user preferences
[5]
Links:
[1] https://discuss.elastic.co/users/mark_harwood
[2] Label on Edges?
[3] https://discuss.elastic.co/users/tfillmor
[4] https://discuss.elastic.co/t/label-on-edges/51451/unsubscribe
[5] https://discuss.elastic.co/my/preferences
There are no stupid questions
It is often the case that there are way more connections in data than are useful to display at once. We tackle this problem using two main strategies:
- Interesting connections first - we use relevance ranking techniques to rank associations so you can focus on the most interesting connections first.
- Summarised connections - each edge is a summary of potentially many documents that assert a connection between vertices.
Let me illustrate these points using an example from the London meetup.com RSVP data.
Here we have a query for "elasticsearch" and Graph shows the London elastic group and we returned only the most relevant attendees first based on the strength of their connection to this group:
Note that it did not return "Ravi" who has RSVPed to every elasticsearch meetup - this is because Ravi signs up to every meetup of every group in London. He is an example of what we call "commonly common" and most datasets have examples of these "supernodes". Most people going to the supermarket buy milk. Most people have listened to the Beatles. Most people access Google each day. etc (see the Zipf mystery [1] for more on this phenomenon). The elastic Graph ranking algorithms can tune out the "commonly common" and focus on the useful connections in the data.
So each edge is a summary of one or more documents that represent a useful connection. If we want to drill down on these documents we can do so much more than what we do today (showing a field from the first 10 documents in a table). The backend elasticsearch aggregation framework can be used to summarise document selections along many dimensions e.g. time, geo or numeric and Kibana uses this API to draw timelines, maps, pie charts etc. All the Graph plugin has to do is provide a simple query which selects the documents to be drawn. We are working on making this connection between Graph UI and the rest of Kibana visualizations happen to give you these drill-downs. This prototype shows a drill-down on a single vertex using a dashboard with timelines :
In this dashboard, selecting multiple vertices would show a stacked bar chart which would help show the dates Colin and I attended a meetup together. If these were banking transactions we could show the volumes of money being traded over time etc. We're not there yet with these UI drill-downs but this should give you an indication of the potential richness of data behind each edge and some indication of what the existing elastic APIs could provide to your apps.
Cheers
Mark
[1] https://www.youtube.com/watch?v=fCn8zs912OE