Nested query to find with multiple nested documents associate with the parent


(Corey Nolet) #1

My data has the following format:

{
"id": "entityId",
"type": "entityType",
"tuples": [
{
"key": "name",
"value": "myentityName",
"type": "string"
},
{
"key": "url",
"value": "http://myentityName/uri",
"type": "uri"
}
],
"_timestamp": "2013090211"
}

What I need is the ability to, in this format, query for all entities that
have both a "tuples.key=name && tuples.value=myentityName" and
"tuples.key=uri && tuples.value=http://myentityName/uri". I haven't been
able to find a good example in the nested query API documentation as the
most I've seen people matching would be a single nested document. I'm using
elasticsearch as a realtime cache where the entities get stored for an hour
before they get pushed to an archive in HBase. The document format above
lends itself well to the key/value indexes i've established in HBase and
it'd be nice if I could keep the same document format for both databases.

Thanks in advance for help!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b223a4f2-03d6-4bc3-87cd-06b023840116%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

Did you try a BoolQuery with 2 nested queries inside?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 27 nov. 2013 à 04:29, Corey Nolet cjnolet@gmail.com a écrit :

My data has the following format:

{
"id": "entityId",
"type": "entityType",
"tuples": [
{
"key": "name",
"value": "myentityName",
"type": "string"
},
{
"key": "url",
"value": "http://myentityName/uri",
"type": "uri"
}
],
"_timestamp": "2013090211"
}

What I need is the ability to, in this format, query for all entities that have both a "tuples.key=name && tuples.value=myentityName" and "tuples.key=uri && tuples.value=http://myentityName/uri". I haven't been able to find a good example in the nested query API documentation as the most I've seen people matching would be a single nested document. I'm using elasticsearch as a realtime cache where the entities get stored for an hour before they get pushed to an archive in HBase. The document format above lends itself well to the key/value indexes i've established in HBase and it'd be nice if I could keep the same document format for both databases.

Thanks in advance for help!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b223a4f2-03d6-4bc3-87cd-06b023840116%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/460F7944-F267-441B-8662-1CD40BB9FCA8%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(tmanta7) #3

Thanks for your help David.

Ok so I simplified my expression:
{
"facets": {
"terms": {
"terms": {
"field": "kv.value",
"size": 10,
"order": "count",
"exclude": []
},
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"nested": {
"path": "kv",
"query": {
"bool": {
"must" : [ { "text" :
{ "kv.key" : "state"}
},
{ "text" :
{ "kv.value" : "designed"}
}
]

                }
              }
            }
          }
        }
      }
    }
  }
}

},
"size": 0
}

But it still does not work :confused:

To summarize the goal is to realize a facet on nested fields (using Kibana)

Le mercredi 27 novembre 2013 07:22:48 UTC+1, David Pilato a écrit :

Did you try a BoolQuery with 2 nested queries inside?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 27 nov. 2013 à 04:29, Corey Nolet <cjn...@gmail.com <javascript:>> a
écrit :

My data has the following format:

{
"id": "entityId",
"type": "entityType",
"tuples": [
{
"key": "name",
"value": "myentityName",
"type": "string"
},
{
"key": "url",
"value": "http://myentityName/uri",
"type": "uri"
}
],
"_timestamp": "2013090211"
}

What I need is the ability to, in this format, query for all entities that
have both a "tuples.key=name && tuples.value=myentityName" and
"tuples.key=uri && tuples.value=http://myentityName/uri". I haven't been
able to find a good example in the nested query API documentation as the
most I've seen people matching would be a single nested document. I'm using
elasticsearch as a realtime cache where the entities get stored for an hour
before they get pushed to an archive in HBase. The document format above
lends itself well to the key/value indexes i've established in HBase and
it'd be nice if I could keep the same document format for both databases.

Thanks in advance for help!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b223a4f2-03d6-4bc3-87cd-06b023840116%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52c18f30-751c-40d9-9096-a4690258ac83%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

I was more thinking of something like:

{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "path_to_nested_doc",
"query": {}
}
},{
"nested": {
"path": "path_to_nested_doc",
"query": {}
}
}
]
}
}
}

First nested for tuples.key=name && tuples.value=myentityName
Second nested for tuples.key=uri && tuples.value=http://myentityName/uri

Hope this help

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 27 novembre 2013 at 23:48:36, tmanta7@gmail.com (tmanta7@gmail.com) a écrit:

Thanks for your help David.

Ok so I simplified my expression:
{
"facets": {
"terms": {
"terms": {
"field": "kv.value",
"size": 10,
"order": "count",
"exclude": []
},
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"nested": {
"path": "kv",
"query": {
"bool": {
"must" : [ { "text" :
{ "kv.key" : "state"}
},
{ "text" :
{ "kv.value" : "designed"}
}
]

                }
              }
            }
          }
        }
      }
    }
  }
}

},
"size": 0
}

But it still does not work :confused:

To summarize the goal is to realize a facet on nested fields (using Kibana)

Le mercredi 27 novembre 2013 07:22:48 UTC+1, David Pilato a écrit :
Did you try a BoolQuery with 2 nested queries inside?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 27 nov. 2013 à 04:29, Corey Nolet cjn...@gmail.com a écrit :

My data has the following format:

{
"id": "entityId",
"type": "entityType",
"tuples": [
{
"key": "name",
"value": "myentityName",
"type": "string"
},
{
"key": "url",
"value": "http://myentityName/uri",
"type": "uri"
}
],
"_timestamp": "2013090211"
}

What I need is the ability to, in this format, query for all entities that have both a "tuples.key=name && tuples.value=myentityName" and "tuples.key=uri && tuples.value=http://myentityName/uri". I haven't been able to find a good example in the nested query API documentation as the most I've seen people matching would be a single nested document. I'm using elasticsearch as a realtime cache where the entities get stored for an hour before they get pushed to an archive in HBase. The document format above lends itself well to the key/value indexes i've established in HBase and it'd be nice if I could keep the same document format for both databases.

Thanks in advance for help!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b223a4f2-03d6-4bc3-87cd-06b023840116%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/52c18f30-751c-40d9-9096-a4690258ac83%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52967841.6ceaf087.3e14%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Corey Nolet) #5

Thanks for your reply. I'm trying this query but still not finding anything
unfortunately. :-/

curl -X GET
'http://mediaserver:9200/testbucket/couchbaseDocument/_search?pretty=true'
-d '
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "tuples",
"query": {
"bool": {
"must": [{
"term": {
"tuples.key": "name"
}
}, {
"term": {
"tuples.value": "myentityName"
}
}]
}
}
}
},{
"nested": {
"path": "tuples",
"query": {
"bool": {
"must": [{
"term": {
"tuples.key": "url"
}
}, {
"term": {
"tuples.value": "http://myentityName/urihttp://www.google.com/url?q=http%3A%2F%2FmyentityName%2Furi&sa=D&sntz=1&usg=AFQjCNEF_VdX4gXdZ7QyDcb2479VkSIoKQ
"
}
}]
}
}
}
}
]
}
}
}'

On Wednesday, November 27, 2013 5:54:57 PM UTC-5, David Pilato wrote:

I was more thinking of something like:

{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "path_to_nested_doc",
"query": {}
}
},{
"nested": {
"path": "path_to_nested_doc",
"query": {}
}
}
]
}
}
}

First nested for tuples.key=name && tuples.value=myentityName
Second nested for tuples.key=uri && tuples.value=http://myentityName/uri

Hope this help

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 27 novembre 2013 at 23:48:36, tma...@gmail.com <javascript:> (
tma...@gmail.com <javascript:>) a écrit:

Thanks for your help David.

Ok so I simplified my expression:
{
"facets": {
"terms": {
"terms": {
"field": "kv.value",
"size": 10,
"order": "count",
"exclude": []
},
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"nested": {
"path": "kv",
"query": {
"bool": {
"must" : [ { "text" :
{ "kv.key" : "state"}
},
{ "text" :
{ "kv.value" : "designed"}
}
]

                }
              }
            }
          }
        }
      }
    }
  }
}

},
"size": 0
}

But it still does not work :confused:

To summarize the goal is to realize a facet on nested fields (using Kibana)

Le mercredi 27 novembre 2013 07:22:48 UTC+1, David Pilato a écrit :

Did you try a BoolQuery with 2 nested queries inside?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 27 nov. 2013 à 04:29, Corey Nolet cjn...@gmail.com a écrit :

My data has the following format:

{
"id": "entityId",
"type": "entityType",
"tuples": [
{
"key": "name",
"value": "myentityName",
"type": "string"
},
{
"key": "url",
"value": "http://myentityName/uri",
"type": "uri"
}
],
"_timestamp": "2013090211"
}

What I need is the ability to, in this format, query for all entities
that have both a "tuples.key=name && tuples.value=myentityName" and
"tuples.key=uri && tuples.value=http://myentityName/uri". I haven't been
able to find a good example in the nested query API documentation as the
most I've seen people matching would be a single nested document. I'm using
elasticsearch as a realtime cache where the entities get stored for an hour
before they get pushed to an archive in HBase. The document format above
lends itself well to the key/value indexes i've established in HBase and
it'd be nice if I could keep the same document format for both databases.

Thanks in advance for help!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b223a4f2-03d6-4bc3-87cd-06b023840116%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/52c18f30-751c-40d9-9096-a4690258ac83%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34895504-0a31-49d1-a195-a49c21ea440c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Corey Nolet) #6

Ok. This worked!

curl -X GET
'http://mediaserver:9200/testbucket/couchbaseDocument/_search?pretty=true'
-d '
{
"query": {
"bool": {
"must": [
{
"nested" : {
"path" : "doc.tuples",
"query" : {
"bool" : {
"must" : [
{
"match" : {"doc.tuples.key" : "name"}
},
{
"match" : {"doc.tuples.value" : "hub1"}
}
]
}
}
}
},{
"nested" : {
"path" : "doc.tuples",
"query" : {
"bool" : {
"must" : [
{
"match" : {"doc.tuples.key" : "url"}
},
{
"match" : {"doc.tuples.value" :
"http://url"}
}
]
}
}
}
}
]
}
}
}'

Now I have two more questions- one of them is due to my ignorance to Lucene.

  1. I'm coming from Key/Value land (HBase, Accumulo) and adding more AND
    statements almost always means scanning more indexes and generally has more
    performance implications than a single term/match query. What is the
    performance implication of performing this AND nested query? I expect my
    elasticsearch cluster to have between 2M and 3M documents in it at any time
    where each document ranges from 1K to 4K (documents will be getting updated
    VERY frequently so it won't be uncommon to have revisions in the tens of
    thousands).

  2. Why didn't the term query work but the match query did?

On Wednesday, November 27, 2013 8:46:55 PM UTC-5, Corey Nolet wrote:

Thanks for your reply. I'm trying this query but still not finding
anything unfortunately. :-/

curl -X GET '
http://mediaserver:9200/testbucket/couchbaseDocument/_search?pretty=true'
-d '
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "tuples",
"query": {
"bool": {
"must": [{
"term": {
"tuples.key": "name"
}
}, {
"term": {
"tuples.value": "myentityName"
}
}]
}
}
}
},{
"nested": {
"path": "tuples",
"query": {
"bool": {
"must": [{
"term": {
"tuples.key": "url"
}
}, {
"term": {
"tuples.value": "http://myentityName/urihttp://www.google.com/url?q=http%3A%2F%2FmyentityName%2Furi&sa=D&sntz=1&usg=AFQjCNEF_VdX4gXdZ7QyDcb2479VkSIoKQ
"
}
}]
}
}
}
}
]
}
}
}'

On Wednesday, November 27, 2013 5:54:57 PM UTC-5, David Pilato wrote:

I was more thinking of something like:

{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "path_to_nested_doc",
"query": {}
}
},{
"nested": {
"path": "path_to_nested_doc",
"query": {}
}
}
]
}
}
}

First nested for tuples.key=name && tuples.value=myentityName
Second nested for tuples.key=uri && tuples.value=http://myentityName/uri

Hope this help

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 27 novembre 2013 at 23:48:36, tma...@gmail.com (tma...@gmail.com) a
écrit:

Thanks for your help David.

Ok so I simplified my expression:
{
"facets": {
"terms": {
"terms": {
"field": "kv.value",
"size": 10,
"order": "count",
"exclude": []
},
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"nested": {
"path": "kv",
"query": {
"bool": {
"must" : [ { "text" :
{ "kv.key" : "state"}
},
{ "text" :
{ "kv.value" : "designed"}
}
]

                }
              }
            }
          }
        }
      }
    }
  }
}

},
"size": 0
}

But it still does not work :confused:

To summarize the goal is to realize a facet on nested fields (using
Kibana)

Le mercredi 27 novembre 2013 07:22:48 UTC+1, David Pilato a écrit :

Did you try a BoolQuery with 2 nested queries inside?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 27 nov. 2013 à 04:29, Corey Nolet cjn...@gmail.com a écrit :

My data has the following format:

{
"id": "entityId",
"type": "entityType",
"tuples": [
{
"key": "name",
"value": "myentityName",
"type": "string"
},
{
"key": "url",
"value": "http://myentityName/uri",
"type": "uri"
}
],
"_timestamp": "2013090211"
}

What I need is the ability to, in this format, query for all entities
that have both a "tuples.key=name && tuples.value=myentityName" and
"tuples.key=uri && tuples.value=http://myentityName/uri". I haven't
been able to find a good example in the nested query API documentation as
the most I've seen people matching would be a single nested document. I'm
using elasticsearch as a realtime cache where the entities get stored for
an hour before they get pushed to an archive in HBase. The document format
above lends itself well to the key/value indexes i've established in HBase
and it'd be nice if I could keep the same document format for both
databases.

Thanks in advance for help!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b223a4f2-03d6-4bc3-87cd-06b023840116%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/52c18f30-751c-40d9-9096-a4690258ac83%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2348cc71-eae9-4e78-a459-255564e13786%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #7
  1. I'm coming from Key/Value land (HBase, Accumulo) and adding more AND statements almost always means scanning more indexes and generally has more performance implications than a single term/match query. What is the performance implication of performing this AND nested query? I expect my elasticsearch cluster to have between 2M and 3M documents in it at any time where each document ranges from 1K to 4K (documents will be getting updated VERY frequently so it won't be uncommon to have revisions in the tens of thousands).
    3M docs is not so much in term of number of documents. That said, it depends on the document size (so the index size).

Having many updates will come with a price of merging segments. Whatever the number of revisions you will have, elasticearch will only keep the latest version. The merge process will expunge deleted docs (means old versions).

I would say: test it! It's easy to import million of docs and play with that even on a laptop.

  1. Why didn't the term query work but the match query did?
    TermQuery is not analyzed. That means we compare exactly your query with the inverted index. The inverted index is built after the analysis process.

Basically, a field containing: "Hello WORLD!" will become "hello", "world" in the inverted index. If you search with a TermQuery for "Hello", Hello <> hello.

A MatchQuery is analyzed. If you search for "Hello", your query will become "hello". hello == hello so it will match.

Makes sense?

--

David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52970295.38437fdb.3e14%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(Corey Nolet) #8

Makes sense. I'm using Couchbase/ES as my front-line cache to be able to
supply the real-time segment of analytics/query for any given current hour.
After the hour, the 2-3M documents get pushed into the Hbase/Accumulo
instance for archival and further (more batch) analytics. I'm hoping, with
the es-hadoop integration, to blur the gap in the APIs between the
real-time data and the archived batch processing. Anywho- I'm not sure what
Couchbase/ES is capable of at as an upper boundary, but my current
couchbase dataset fluctuates between 6GB and 10GB of memory at any given
time. It's also constantly being updated and those updates get alerted (so
I'm always doing get/set instead of just replace).

Thanks much for your help!

On Thursday, November 28, 2013 3:45:09 AM UTC-5, David Pilato wrote:

  1. I'm coming from Key/Value land (HBase, Accumulo) and adding more AND
    statements almost always means scanning more indexes and generally has more
    performance implications than a single term/match query. What is the
    performance implication of performing this AND nested query? I expect my
    elasticsearch cluster to have between 2M and 3M documents in it at any time
    where each document ranges from 1K to 4K (documents will be getting updated
    VERY frequently so it won't be uncommon to have revisions in the tens of
    thousands).

3M docs is not so much in term of number of documents. That said, it
depends on the document size (so the index size).

Having many updates will come with a price of merging segments. Whatever
the number of revisions you will have, elasticearch will only keep the
latest version. The merge process will expunge deleted docs (means old
versions).

I would say: test it! It's easy to import million of docs and play with
that even on a laptop.

  1. Why didn't the term query work but the match query did?

TermQuery is not analyzed. That means we compare exactly your query with
the inverted index. The inverted index is built after the analysis process.

Basically, a field containing: "Hello WORLD!" will become "hello", "world"
in the inverted index. If you search with a TermQuery for "Hello", Hello <>
hello.

A MatchQuery is analyzed. If you search for "Hello", your query will
become "hello". hello == hello so it will match.

Makes sense?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dc14e4d0-c563-4909-960f-6370154490dc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #9