Parallel connection hop paths


(Robert Foreman) #1

The Explore API supports nesting connections terms, to define a hop path. But it doesn't seem possible to use an array of connection objects. This would allow me to define a tree that has different child node types and branch them in independent ways. Is this something that is already supported in another manner?

(Mark Harwood) #2

The connections field essentially marks another hop-stage in the exploration path. At this point all terms from all fields in the prior hop are used as source search terms to discover more target terms in matching documents. The target terms you are trying to discover can come from multiple different fields which are listed in the vertices section nested under the connections field. So in these example docs the path is looking at click logs starting from query.raw:midi -> products:* -> query.raw:*. Here there's no reason why the final vertices leaf nodes couldn't also include another field e.g. department to add department connections as well as common queries to the midi products discovered along the way.

(Robert Foreman) #3

Thanks Mark,

I see how to add multiple vertices fields at the different hop-stages, but want to build a graph where the intermediary hops stages diverge, as I'm only interested in connections of different node.

For example, we are storing data form a set of tables in a flattened index, where we have multiple documents to express all combinations of data from the child tables - imagine:

Person | Child | ChildActivity | Car | Model | Manufacture | TrafficViolationType | Ticketing Officer

Storing data like this let's me build a dashboard to see the relationships between ChildActivities and Car Manufactures (What cars to soccer mom's buy?) ; TrafficViolationTypes and Models (are certain cars more likely to speed) and ChildActivities and TrafficViolationTypes (Do football parents get parking tickets?)

Now I'd like build a graph representation of the person: I'd like to query by person ID, and build a tree with the Person at the center, and three branches:

  1. (multiple) Children->ChildActivity
  2. (multiple) Car->Model->Manufacture
  3. (multiple) TrafficViolationType -> Ticketing Officer

In this graph, I don't want to link these three nodes to each other: ChildActivity, Manufacture, Ticketing Officer.

If I only had 2 branches, I could do this. I could query by personID, but direct the hop path this way: Manufacturer->Model->Car->PersonID->Child->ChildActivity.

This would give me the 2-branched tree I want, since the middle term will only match the one PersonID from the query. But I'd like to also include the TrafficViolation-> Ticketing Officer branch off of the person. If I could provide an array of connections under the personID connection, rather then just a single connection, that seems like it would work. I don't know if there's another approach I'm missing.


(Mark Harwood) #4

Remember that connections will only be drawn between terms (eg carModel and carManufacturer) if they co-exist together in at least one document. If your documents never mix certain terms (e.g. ChildActivity and TicketingOfficer) then a connection for these will never be returned in the results. This being the case, I expect you could expand each hop with all fields listed in the vertices section and only connections that make sense (i.e have non-zero term pairing across fields ) will be returned.

Failing that, multiple requests will have to be made from your client.

(Robert Foreman) #5

Thanks for the reply.

I understand, if we store the data with sparsely populated documents, such that they have either the data related to the ticket, or data related to the children, that should let the graph be built like I've asked.

But, (And maybe we're doing something very wrong) we are planning to store all combinations of ChildActivities by Tickets - so we can allow users to filter by one and see aggregations of the other.

This does mean if we have 10 child activities and 10 tickets, we will have 100 documents. It gets worse when we add another dimension of data to the problem. But this is what I mean by flattening out the data for use in Kibana. I don't know how else we could let users filter by ticket type, and see aggregation metrics like count of parents with children in a certain sport who got that ticket. If there's a document storage pattern that's more efficient but allows users to do this type of investigation, please let me know.

Maybe I need to store the data both ways, and use one index for the graph and one for aggregation visualizations?

(Mark Harwood) #6

I'm not sure I understand enough about your data or the business problem you're trying to solve to give a conclusive answer. It is not unreasonable though that you might require a different dataset for regular slice-and-dice in Kibana compared to what might be required to walk connections in the data. After all, a typical graph database would require you to break your original business documents into many node->relationship-node edges to represent only the connections you find useful for exploring in a graph.

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.