Filters vs Queries

Hi,
Apologize in advance if my questions are stupid. I come from SQL
environment. Trying to learning Elastic Search.

  1. I read on the Elastic Search site that filters have performance gain
    over queries. Then why wouldn't you always use filters then? What are some
    sample queries that you can do with queries but not filters? What exactly
    is the difference between queries and filters? I tried googling but didn't
    find anything. any info or link would be greatly appreciated.

  2. I tried playing with queries and filters...The Query 1 works, but Query
    2 returned an error. Aren't the two queries functionally equivalent? What
    is wrong with the second query?

Query 1:
{
"query":{
"bool":{
"must":[
{"term":{"story_date":20121010}},
{"match":{"_all":"Some Text"}}]
}
},
"from":0,
"size":2
}

Query 2:
{
"filtered" : {
"query" : {
"match":{"_all":"Some Text"}
},
"filter" : {
"term":{"story_date":20121010}
}
},
"from":0,
"size":2
}

--

1/ when you query, you compute scores to identify documents more relevant than others.
If you use a MatchQuery or a QueryString like this "elasticsearch rss river plugin", some documents can have all terms and some others will have only rss term but they will match also.
So query is used to compute scores.
Filtering is to reduce the dataset on which you will query. There's no scoring here. ES just ignore other documents than filtered ones.
That's why filtering is faster than querying and you should use it if scoring has no sense for your use case.

2/ i will try to answer later

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 14 nov. 2012 à 00:36, gchen kevinch84@gmail.com a écrit :

Hi,
Apologize in advance if my questions are stupid. I come from SQL environment. Trying to learning Elastic Search.

  1. I read on the Elastic Search site that filters have performance gain over queries. Then why wouldn't you always use filters then? What are some sample queries that you can do with queries but not filters? What exactly is the difference between queries and filters? I tried googling but didn't find anything. any info or link would be greatly appreciated.

  2. I tried playing with queries and filters...The Query 1 works, but Query 2 returned an error. Aren't the two queries functionally equivalent? What is wrong with the second query?

Query 1:
{
"query":{
"bool":{
"must":[
{"term":{"story_date":20121010}},
{"match":{"_all":"Some Text"}}]
}
},
"from":0,
"size":2
}

Query 2:
{
"filtered" : {
"query" : {
"match":{"_all":"Some Text"}
},
"filter" : {
"term":{"story_date":20121010}
}
},
"from":0,
"size":2
}

--

--

2/ I think you have to insert your query in a query node.
See

$ curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"query" : "some query string here"
}
},
"filter" : {
"term" : { "user" : "kimchy" }
}
}
}
}
'

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 14 nov. 2012 à 05:10, David Pilato david@pilato.fr a écrit :

1/ when you query, you compute scores to identify documents more relevant than others.
If you use a MatchQuery or a QueryString like this "elasticsearch rss river plugin", some documents can have all terms and some others will have only rss term but they will match also.
So query is used to compute scores.
Filtering is to reduce the dataset on which you will query. There's no scoring here. ES just ignore other documents than filtered ones.
That's why filtering is faster than querying and you should use it if scoring has no sense for your use case.

2/ i will try to answer later

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 14 nov. 2012 à 00:36, gchen kevinch84@gmail.com a écrit :

Hi,
Apologize in advance if my questions are stupid. I come from SQL environment. Trying to learning Elastic Search.

  1. I read on the Elastic Search site that filters have performance gain over queries. Then why wouldn't you always use filters then? What are some sample queries that you can do with queries but not filters? What exactly is the difference between queries and filters? I tried googling but didn't find anything. any info or link would be greatly appreciated.

  2. I tried playing with queries and filters...The Query 1 works, but Query 2 returned an error. Aren't the two queries functionally equivalent? What is wrong with the second query?

Query 1:
{
"query":{
"bool":{
"must":[
{"term":{"story_date":20121010}},
{"match":{"_all":"Some Text"}}]
}
},
"from":0,
"size":2
}

Query 2:
{
"filtered" : {
"query" : {
"match":{"_all":"Some Text"}
},
"filter" : {
"term":{"story_date":20121010}
}
},
"from":0,
"size":2
}

--

--

--

Hi,

I am using the Query 2 in my app. The mapping and the query are listed
below:

I am getting more documents that that matches the Query !!

Mapping:

Query
{
"from": 1,
"size": 10,
"query":{
"filtered" :{
"query": {
"bool":{
"must":[
{
"term" : {"notes_emails.reference_type" : "10"
}
}
]
}
},
"filter": {
"term": {
"notes_emails.reference_type": "10"
}
}
}
}

"MyIndexType" :{
"type" : "object",
"properties" : {
"notes_emails":{
"type":"object",
"properties" : {
"_id" : {"type" : "integer", "store":"yes"},
"text" : {"type" : "string", "store":"yes",
"index" : "not_analyzed","term_vector" : "with_positions_offsets"},
"subject" : {"type" : "string", "store":"yes", "index"
: "not_analyzed","term_vector" : "with_positions_offsets"},
"creation_date" : {"type" : "date", "store":"yes",
"index" : "not_analyzed"},
"modification_date" : {"type" : "date", "store":"yes",
"index" : "not_analyzed"},
"created_by" : {"type" : "integer", "store":"yes",
"index" : "not_analyzed"},
"modified_by" : {"type" : "integer", "store":"yes", "index"
: "not_analyzed"},
"activity_type" : {"type" : "string", "store":"yes",
"index" : "not_analyzed"},
"reference_type" : {"type" : "integer", "store":"yes",
"index" : "not_analyzed"},
"type" : {"type" : "string", "store":"yes"}
}
}
}

And here are the results i am getting:

"hits": [
{
"_index": "site462",
"_type": "contact_notes",
"_id": "1340",
"_score": 1,
"_source": {
"notes_emails": {
"_id": 117,
"text": "my test note",
"creation_date": "2012-03-16T09:40:31.725Z",
"modification_date": "2012-03-16T09:40:31.725Z",
"created_by": 342,
"modified_by": 342,
"reference_type": 10,
"type": "ActivityContact"
}
}
},
{
"_index": "site462",
"_type": "contact_notes",
"_id": "1706",
"_score": 1,
"_source": {
"notes_emails": {
"_id": 329,
"text": "another test note",
"creation_date": "2012-07-04T21:57:57.665Z",
"modification_date": "2012-07-04T22:16:18.644Z",
"created_by": 363,
"modified_by": 363,
"reference_type": 10,
"type": "ActivityContact"
}
}
},
{
"_index": "site462",
"_type": "contact_notes",
"_id": "1703",
"_score": 1,
"_source": {
"notes_emails": {
"_id": 37,
"text": "test note",
"creation_date": "2011-12-28T07:29:07.715Z",
"modification_date": "2012-07-03T11:08:53.699Z",
"created_by": 342,
"modified_by": 342,
"reference_type": 70,
"type": "ActivityContact"
}
}
},
{
"_index": "site462",
"_type": "contact_notes",
"_id": "1341",
"_score": 1,
"_source": {
"notes_emails": {
"_id": 26,
"text": "test d",
"creation_date": "2011-12-05T13:21:57.668Z",
"modification_date": "2011-12-05T13:21:57.668Z",
"created_by": 342,
"modified_by": 342,
"type": "ActivityContact"
}
}
}

I am finding it hard to understand why the last two results are returned..

Thanks

On Wednesday, November 14, 2012 5:06:37 AM UTC+5:30, gchen wrote:

Hi,
Apologize in advance if my questions are stupid. I come from SQL
environment. Trying to learning Elastic Search.

  1. I read on the Elastic Search site that filters have performance gain
    over queries. Then why wouldn't you always use filters then? What are some
    sample queries that you can do with queries but not filters? What exactly
    is the difference between queries and filters? I tried googling but didn't
    find anything. any info or link would be greatly appreciated.

  2. I tried playing with queries and filters...The Query 1 works, but Query
    2 returned an error. Aren't the two queries functionally equivalent? What
    is wrong with the second query?

Query 1:
{
"query":{
"bool":{
"must":[
{"term":{"story_date":20121010}},
{"match":{"_all":"Some Text"}}]
}
},
"from":0,
"size":2
}

Query 2:
{
"filtered" : {
"query" : {
"match":{"_all":"Some Text"}
},
"filter" : {
"term":{"story_date":20121010}
}
},
"from":0,
"size":2
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi

I am using the Query 2 in my app. The mapping and the query are
listed below:

I am getting more documents that that matches the Query !!

OK I've spent some time looking at this. The first problem is that you
are providing just the snippets that you think are important, but we're
missing the full recreation. The information you provide is not where
the problem is, so I can't tell you why it is failing.

Also, if you provide a full recreation, then it makes it easier for us
to retry your code to find the problem.

That said, let me point out some issues:

Mapping:

"MyIndexType" :{
"type" : "object",
"properties" : {
"notes_emails":{
"type":"object",
"properties" : {
"_id" : {"type" : "integer", "store":"yes"},
"text" : {"type" : "string", "store":"yes",
"index" : "not_analyzed","term_vector" : "with_positions_offsets"},
"subject" : {"type" : "string", "store":"yes",
"index" : "not_analyzed","term_vector" : "with_positions_offsets"},
"creation_date" : {"type" : "date", "store":"yes",
"index" : "not_analyzed"},
"modification_date" : {"type" : "date", "store":"yes",
"index" : "not_analyzed"},
"created_by" : {"type" : "integer",
"store":"yes", "index" : "not_analyzed"},
"modified_by" : {"type" : "integer", "store":"yes",
"index" : "not_analyzed"},
"activity_type" : {"type" : "string", "store":"yes",
"index" : "not_analyzed"},
"reference_type" : {"type" : "integer",
"store":"yes", "index" : "not_analyzed"},
"type" : {"type" : "string", "store":"yes"}
}
}
}

  1. You do not need to set "store" to yes on all your fields. You almost
    never want to store fields separately, as they are already available via
    the _source field.

  2. you "text" and "subject" fields are set to "not_analyzed", which
    means that you can't use full text search their content

  3. You don't need to set "index: not_analyzed" for non string fields.
    Only string fields can be analyzed.

  4. Your "type" field SHOULD probably be not_analyzed, as it looks like
    an enum rather than full text

Query
{
"from": 1,
"size": 10,
"query":{
"filtered" :{
"query": {
"bool":{
"must":[
{
"term" : {"notes_emails.reference_type" :
"10" }
}
]
}
},
"filter": {
"term": {
"notes_emails.reference_type": "10"
}
}
}
}

  1. Your filter and query clauses are redundant - they do the same thing.
    Also, no need to wrap the single "term" query in a bool/must. Use bool
    to combine multiple clauses.

  2. Using "from: 1" means that you are skipping the first result.
    Results are numbered from zero.

And here are the results i am getting:

These are not the results from the above query. I can't tell you where
you are going wrong, because I'm missing all the details.

I have created a gist using the info above (plus the changes I
recommend) to show you a working example. It shows that your existing
query works, but I also show two other example queries:

Clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.