Elasticsearch graph query not working

graph

(James Crone) #1

i am new in elastic search graph.i have installed successfully but when i try to choose index name,fields etc no data return.
Here's my query:
{
"query": {
"query_string": {
"default_field": "_all",
"query": "male"
}
},
"controls": {
"use_significance": true,
"sample_size": 100,
"timeout": 100000
},
"connections": {
"vertices": [
{
"field": "gender",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_fname",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_lname",
"size": 5,
"min_doc_count": 3
}
]
},
"vertices": [
{
"field": "gender",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_fname",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_lname",
"size": 5,
"min_doc_count": 3
}
]
}

if i change this query to, and try to post data by postman:

{
"query": {
"match": {
"gender": "male."
}
},
"controls": {
"use_significance": true,
"sample_size": 100,
"timeout": 100000
},
"connections": {
"vertices": [
{
"field": "gender",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_fname",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_lname",
"size": 5,
"min_doc_count": 3
}
]
},
"vertices": [
{
"field": "gender",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_fname",
"size": 5,
"min_doc_count": 3
},
{
"field": "persona_lname",
"size": 5,
"min_doc_count": 3
}
]
}

it returns data with vertices and weight, what should i do in graph setting? or what i am missing?


(Mark Harwood) #2

Looking at the first example query I'm not sure what problem you're trying to solve using graph? They don't look like useful fields to draw in a network. It's important to start with an idea of what might be a useful thing to do with your data. I'll give a real case of something useful at the end of this post but for now let's break down what is happening in your example.

I presume you have one doc per person with gender, fname and lname.
Breaking your request down the steps are:

  1. query for all males
  2. Find significant values in these fields:
    a) gender - clearly male will be the only gender you'd expect to find in a query for males
    b) first name - I'd expect to see "dave" , "john" etc here as significantly associated with males
    c) last name - individual surnames are not aligned to the male gender so these will be wholly insignificant selections e.g. smith
  3. For the values found in 2) find significant others. We would expect a query for male, dave, john, smith etc to match docs that were mostly males (although with a handful of common surnames and a big sample size that will match many females too). This is not a particularly well-focused set of docs in which to go looking for significant connections. The results, if any, are going to be spurious.
    A small sample will consist entirely of johns and daves and so there are no new first names to find. The genders will all be male (which we already knew from 2a). Any last names are equally going to be weak connections.

A more practical example based on this sort of data is creating a forename thesaurus which can be used to discover name abbreviations and common typos. This is something I did using billions of a bank's records and was able to derive a name thesaurus using this simple file structure as input:

customerID Name


534242131 Bob
534242131 Robert
657464534 Alice
657464534 Sue

So the data format is bank's unique ID for a person and every recorded name that person had ever used in interactions with the bank. Each person can then be represented with a Json doc like this:

{
   "id": 657464534
   "names": ["Alice", "Sue"]
}

Now not many people who called themselves Alice also called themselves Sue at some point so this is not an example of significant connection between names. However, when you look at enough examples of these records Graph will draw out that "Bob" and "Robert" are indeed strongly connected. This is reinforced through many examples (or as many as min_doc_count requires for weight-of-evidence). The resulting weighted graph is pretty interesting (small example below):

The weights of associations (not shown) tell us for example that "janes" is much more likely to be "james" than "jane". This is a behavioural side-effect of the m and n keys being next to each other on the keyboard.

This sort of data analysis is much more the sort of thing that Graph is tuned to help with. The default configuration is trying to identify Bob->Robert connections and tune out the Alice->Sue noise so if you want to just explore all connections, follow the setting suggestions in [1]

Hope this helps

Mark

[1] https://www.elastic.co/guide/en/graph/current/graph-troubleshooting.html#_why_are_results_missing


(James Crone) #3

Thank you for your reply.i understand your example.
Actually, my problem is related to this question:

If users bought this type of gardening gloves, what other products might they be interested in?

Here is my index data:

{"tran_date": "2011-12-13","amount": 3540.92,"venue": "St.101 Wales","voucher": "NX6RMQ","voucher_value": 9,"points": 5,"data_source": "user","discount": 6,"payment_by": "Card","tax_type": "vat","tax_value": "12.0","currency": "$","category": "flight","product_name": "Ipod","sub_product_name": "Ipod","product_des": "asdf","product_type": "Ipod","product_unit_price": 3540.92,"product_qty": "1"},{"tran_date": "2012-03-01","amount": 637.6,"venue": "St.101 Wales","voucher": "3MZLYD","voucher_value": 9,"points": 4,"data_source": "user","discount": 5,"payment_by": "Card","tax_type": "vat","tax_value": "12.0","currency": "$","category": "flight","product_name": "Ipod","sub_product_name": "Ipod","product_des": "asdf","product_type": "Ipod","product_unit_price": 637.6,"product_qty": "1"}

i want to create a graph for predict products:
For example: if user 'A' purchase IPOD. And in future i add laptop then how can i predict that user 'A' wants to purchase it because he is interested in electronics IPOD. Is it right way to go with graph for this problem? if yes, how can i achieve it?


(Mark Harwood) #4

You can start from either end (laptop or User A) and find the other.
Let's assume you have a buyer-centric index like the doc I used for figuring out which first names are strongly related but instead of person names you have an array of SKUs (product codes) that each user has purchased. I see you also have product types and categories - these could also be stored in each user's purchase history.
Using these documents you can then draw out the strong connections e.g. people who buy ipods have a tendency to buy Beats headphones. This is the same principle as people who call themselves Robert also tend to call themselves Bob.


(system) #5