Data format for graph

graph

(A B) #1

Hi, new user here, so excuse the basic question. How do I setup data to appear in graph format? Is there an example dataset you provide for better understanding.

For example, I am looking at the following dataset - https://github.com/swissleaks/swiss_leaks_data/tree/master/ICIJ_20140123 and trying to visualize a graph but unable to. I see nodes, but no links to edges. Also, clicking on a node does not provide details about the node.

Thanks.


(Mark Harwood) #2

Hi Abasu,

If you're planning on using the Kibana Graph UI (and Kibana in general) then the typical advice is to index information so that the tokens used to represent the things you want to report on are:

  1. Unambiguous
  2. Readable
    So, as an example:
  • Email addresses, hashtags or domain names work great without any changes
  • Bank account numbers could ideally do with the customer name appended
  • Customer names are not reliably unique and could ideally do with customer IDs attached

This is part of general preparation of content for analysis - computers want unique IDs but people want to read labels and if your indexed strings serve both purposes you avoid the cost of expensive joins at search time that can otherwise limit scalability.

So, the data you reference has a lot of IDs but lacks labels. Having come from the same people who provided OffshoreLeaks and PanamaPapers datasets I imagine it is a similar format and needs a similar labelling treatment. The PanamaPapers blog post [1] I wrote contains scripts to load this sort of data and index appropriately.

This blog post also describes the settings you need to turn on for this "forensics" type work as the default settings are more tuned for "wisdom of crowds" scenarios where edges only appear if enough docs/people assert there is a strong-enough relationship to draw out.

Hope this helps.

[1] https://www.elastic.co/blog/using-elastic-graph-and-kibana-to-analyze-panama-papers


(A B) #3

Thanks for the link. However, does not provide a link to the exact data used or any manipulations done to it.


(Mark Harwood) #4

The link is http://bit.ly/espanama


(system) #5