I have a situation where i know the relationship between the vertices that i want to explore. To provide some context:
We are a large government organization. We have defined all our application in a service catalog. Each application has multiple components most of which run on different computers. i.e highly distributed across many systems.
I'm using ELK to gather performance data from every system. This will be used in a capacity planning and eventual charge back to clients.
Since i know the topology or distributed system in use of every piece of the distributed applications across all system , i was hoping to use graph and define all vertices. Each vertices would be a computer that is used in the topology for a specific application.
Users would be able to see all the systems that make up an application and would be able to drill down on each vertices to get CPU, memory or disk usage.
Here is my approach:
I would store all of these topologies in a special ELK index.
Users would choose these the same way they would choose dashboards.
Does using Graph this way make sense?
What is the best way to do this ?
So each computer would require a unique ID held in a field of your docs.
What would the connections represent? A single document that asserts computer A talks to computer B? Or are you trying to summarise millions of log records that record actual communication events between machines?
Yes you are correct. Each vertices would represent a computer that would be uniquely identified by its IP address. We have 1000's of computers (mostly VM) running either Windows or Red Hat Linux , IBM mainframe etc ...
The connections are an initial view of the application (i.e. all systems that make up that specific application). Once this graph is present, then any vertices can be selected in order to drill down on that nodes specific contribution to a total metric. For example, percentage of CPU MIPS that this node uses.
Amount of traffic in a node etc ....
This is likely too much information to be usefully shown in a single workspace.
Some visualizations are inherently scalable e.g. date histograms - regardless of the time-scale and data volumes you can always pick an appropriate zoom level to bucket the information on X and Y axes to show a useful overview.
Relational visualizations like Venn diagrams or Graph visualizations cannot scale to cope with increases in what is being shown. Too many entities and relationships just lead to clutter.
Save the 50 graphs having loaded each of them with the appropriate vertices. This, by design, is a preserved snapshot of vertices and connections which may not necessarily represent the current state of your documents in the index. The saved workspaces can be searched and found from the Graph UI's "Open" menu.
Save a single blank-template workspace with the fields/icons you want to use and open this workspace using a url that contains a ?query= parameter that has either
a) a plain Lucene-syntax query string
b) an elasticsearch query expressed using JSON
c) a full graph-explore syntax using JSON.
In each case this will query your index and initialize your workspace with the matching vertices.
You can save a number of elasticsearch documents with the appropriate hyperlink to open the workspace - see this demo where I create an "alerts" index with examples of such URLs and how they are used: https://youtu.be/liMhiiyQ9co?t=17m4s
Mark,
Thanks you for your help on this. Right now I'm in the architecture phase and wanted to ensure that the graph would do the job. From our discussion it looks like it will do. Once we get into implementation, I'm certain you will be hearing from me
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.