Strange inconsistency in searcing using booleans


(James Cook-3) #1

I just recently upgraded from 0.18 to 0.19.2 and noticed this new behavior
creep into my codebase, although I would have to consider it a bug.

I have a mapping file with quite a few different properties, but here is a
subset containing some boolean values:

$ curl -XGET 'http://localhost:9311/nep/ventures/_mapping?pretty=true'
{
"ventures" : {
"_id" : {
"index" : "not_analyzed"
},
"properties" : {
"active" : {
"type" : "boolean",
"index" : "not_analyzed"
},
"idea" : {
"type" : "boolean",
"index" : "not_analyzed"
},
"serviceProvider" : {
"type" : "boolean",
"index" : "not_analyzed"
}
}
}
}

After inserting a few dozen sample records, I perform some queries:
{ "query" : {
"bool" : {
"must" : [
{ "field" : { "active" : true } },
{ "field" : { "idea" : false } },
{ "field" : { "serviceProvider" : true } }
]
}
}
}

Any time I specify { "field" : { "idea" : false } } I get 0 results, but
when I change this to a term search, I see my expected results. None of the
other boolean fields require this stipulation and work the same way whether
I specify term or field. In 0.18, a term search was not required for the
'idea' property.

Is this expected behavior, or is something else interfering with my
mapping/queries?


(Shay Banon) #2

Can you post a full recreation? This seems to work:
https://gist.github.com/2551823.

On Fri, Apr 27, 2012 at 4:20 PM, James Cook jcook@pykl.com wrote:

I just recently upgraded from 0.18 to 0.19.2 and noticed this new behavior
creep into my codebase, although I would have to consider it a bug.

I have a mapping file with quite a few different properties, but here is a
subset containing some boolean values:

$ curl -XGET 'http://localhost:9311/nep/ventures/_mapping?pretty=true'
{
"ventures" : {
"_id" : {
"index" : "not_analyzed"
},
"properties" : {
"active" : {
"type" : "boolean",
"index" : "not_analyzed"
},
"idea" : {
"type" : "boolean",
"index" : "not_analyzed"
},
"serviceProvider" : {
"type" : "boolean",
"index" : "not_analyzed"
}
}
}
}

After inserting a few dozen sample records, I perform some queries:
{ "query" : {
"bool" : {
"must" : [
{ "field" : { "active" : true } },
{ "field" : { "idea" : false } },
{ "field" : { "serviceProvider" : true } }
]
}
}
}

Any time I specify { "field" : { "idea" : false } } I get 0 results, but
when I change this to a term search, I see my expected results. None of the
other boolean fields require this stipulation and work the same way whether
I specify term or field. In 0.18, a term search was not required for the
'idea' property.

Is this expected behavior, or is something else interfering with my
mapping/queries?


(James Cook-3) #3

Unfortunately, it is a much more complicated structure than I have
represented here and there are lots of different types. Perhaps I can get
some advice regarding a couple aspects of the general problem?

  1. If I have properties in other types also called 'idea' can I expect
    strange behavior if they are mapped in a manner different than the property
    in this type? (Are property mappings specific to 'type'?)
  2. When querying or filtering by a boolean value, should one use a field
    or a term?
  3. I suppose one should almost never do a query for a boolean value as
    it most likely would not impact score unless it was in a 'should' clause.
    Should boolean conditions most often be included in a filter and very
    rarely included in a query?

Thanks.

On Sun, Apr 29, 2012 at 12:47 PM, Shay Banon kimchy@gmail.com wrote:

Can you post a full recreation? This seems to work:
https://gist.github.com/2551823.

On Fri, Apr 27, 2012 at 4:20 PM, James Cook jcook@pykl.com wrote:

I just recently upgraded from 0.18 to 0.19.2 and noticed this new
behavior creep into my codebase, although I would have to consider it a bug.

I have a mapping file with quite a few different properties, but here is
a subset containing some boolean values:

$ curl -XGET 'http://localhost:9311/nep/ventures/_mapping?pretty=true'
{
"ventures" : {
"_id" : {
"index" : "not_analyzed"
},
"properties" : {
"active" : {
"type" : "boolean",
"index" : "not_analyzed"
},
"idea" : {
"type" : "boolean",
"index" : "not_analyzed"
},
"serviceProvider" : {
"type" : "boolean",
"index" : "not_analyzed"
}
}
}
}

After inserting a few dozen sample records, I perform some queries:
{ "query" : {
"bool" : {
"must" : [
{ "field" : { "active" : true } },
{ "field" : { "idea" : false } },
{ "field" : { "serviceProvider" : true } }
]
}
}
}

Any time I specify { "field" : { "idea" : false } } I get 0 results, but
when I change this to a term search, I see my expected results. None of the
other boolean fields require this stipulation and work the same way whether
I specify term or field. In 0.18, a term search was not required for the
'idea' property.

Is this expected behavior, or is something else interfering with my
mapping/queries?


(Clinton Gormley) #4

Hiya James

This works for me too. Are you sure you're not using the 'idea' field in
a different type with a different mapping?

On Sun, 2012-04-29 at 20:30 -0400, James Cook wrote:

Unfortunately, it is a much more complicated structure than I have
represented here and there are lots of different types. Perhaps I can
get some advice regarding a couple aspects of the general problem?
1. If I have properties in other types also called 'idea' can I
expect strange behavior if they are mapped in a manner
different than the property in this type? (Are property
mappings specific to 'type'?)

Right - you can get weird results. Property mappings can interfere with
each other. I'm unsure of the exact mechanism, but I if you name the
field unequivocally, then it should do the right thing, eg including the
type name in the field name 'mytype.idea'

Otherwise, it can resolve the name 'idea' to the wrong mapping.

 1. When querying or filtering by a boolean value, should one use
    a field or a term?

You should be able to use either. 'field' sees that the property is
mapped as boolean, and does the right thing.

 1. I suppose one should almost never do a query for a boolean
    value as it most likely would not impact score unless it was
    in a 'should' clause. Should boolean conditions most often be
    included in a filter and very rarely included in a query?

As you say, it depends whether you want them to affect the score or not.
That's the general rule. Sometimes, depending on your query and how
often you use the exact combination of values in that query, it may be
more efficient to use a bool query rather than several filters. But
whether it is better or not depends very much on what you're doing, and
you should test how each performs before you make a decision on that.

Also, see the 'bool' execution mode of the 'terms' filter, which (again,
depending on your exact query) may be more efficient still:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter.html

clint

On Sun, Apr 29, 2012 at 12:47 PM, Shay Banon kimchy@gmail.com wrote:
Can you post a full recreation? This seems to
work: https://gist.github.com/2551823.

    On Fri, Apr 27, 2012 at 4:20 PM, James Cook <jcook@pykl.com>
    wrote:
            I just recently upgraded from 0.18 to 0.19.2 and
            noticed this new behavior creep into my codebase,
            although I would have to consider it a bug.
            
            
            I have a mapping file with quite a few different
            properties, but here is a subset containing some
            boolean values:
            
            
            $ curl -XGET
            'http://localhost:9311/nep/ventures/_mapping?pretty=true'
            {
              "ventures" : {
                "_id" : {
                  "index" : "not_analyzed"
                },
                "properties" : {
                  "active" : {
                    "type" : "boolean",
                    "index" : "not_analyzed"
                  },
                  "idea" : {
                    "type" : "boolean",
                    "index" : "not_analyzed"
                  },
                  "serviceProvider" : {
                    "type" : "boolean",
                    "index" : "not_analyzed"
                  }
                }
              }
            }
            
            
            After inserting a few dozen sample records, I perform
            some queries:
            { "query" : { 
                  "bool" : { 
                      "must" : [ 
                          { "field" : { "active" : true } },
                          { "field" : { "idea" : false } },
                          { "field" : { "serviceProvider" :
            true } }
                        ] 
                    } 
                }
            }
            
            
            
            Any time I specify { "field" : { "idea" : false } } I
            get 0 results, but when I change this to a term
            search, I see my expected results. None of the other
            boolean fields require this stipulation and work the
            same way whether I specify term or field. In 0.18, a
            term search was not required for the 'idea' property.
            
            
            Is this expected behavior, or is something else
            interfering with my mapping/queries?

(James Cook-3) #5

Thanks Clinton, I'm writing some basic unit tests to get a better
understanding of term vs. field. Is there any general purpose guidance you
can add to when one would use a field search/filter versus a term
search/filter. Is a term search meant to be run against only properties
that are not analyzed? I am just not clear on what the diference is between
the two.

-- jim

On Monday, April 30, 2012 1:59:12 AM UTC-4, Clinton Gormley wrote:

Hiya James

This works for me too. Are you sure you're not using the 'idea' field in
a different type with a different mapping?

On Sun, 2012-04-29 at 20:30 -0400, James Cook wrote:

Unfortunately, it is a much more complicated structure than I have
represented here and there are lots of different types. Perhaps I can
get some advice regarding a couple aspects of the general problem?
1. If I have properties in other types also called 'idea' can I
expect strange behavior if they are mapped in a manner
different than the property in this type? (Are property
mappings specific to 'type'?)

Right - you can get weird results. Property mappings can interfere with
each other. I'm unsure of the exact mechanism, but I if you name the
field unequivocally, then it should do the right thing, eg including the
type name in the field name 'mytype.idea'

Otherwise, it can resolve the name 'idea' to the wrong mapping.

 1. When querying or filtering by a boolean value, should one use 
    a field or a term? 

You should be able to use either. 'field' sees that the property is
mapped as boolean, and does the right thing.

 1. I suppose one should almost never do a query for a boolean 
    value as it most likely would not impact score unless it was 
    in a 'should' clause. Should boolean conditions most often be 
    included in a filter and very rarely included in a query? 

As you say, it depends whether you want them to affect the score or not.
That's the general rule. Sometimes, depending on your query and how
often you use the exact combination of values in that query, it may be
more efficient to use a bool query rather than several filters. But
whether it is better or not depends very much on what you're doing, and
you should test how each performs before you make a decision on that.

Also, see the 'bool' execution mode of the 'terms' filter, which (again,
depending on your exact query) may be more efficient still:
http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter.html

clint

On Sun, Apr 29, 2012 at 12:47 PM, Shay Banon kimchy@gmail.com wrote:
Can you post a full recreation? This seems to
work: https://gist.github.com/2551823.

    On Fri, Apr 27, 2012 at 4:20 PM, James Cook <jcook@pykl.com> 
    wrote: 
            I just recently upgraded from 0.18 to 0.19.2 and 
            noticed this new behavior creep into my codebase, 
            although I would have to consider it a bug. 
            
            
            I have a mapping file with quite a few different 
            properties, but here is a subset containing some 
            boolean values: 
            
            
            $ curl -XGET 
            'http://localhost:9311/nep/ventures/_mapping?pretty=true' 
            { 
              "ventures" : { 
                "_id" : { 
                  "index" : "not_analyzed" 
                }, 
                "properties" : { 
                  "active" : { 
                    "type" : "boolean", 
                    "index" : "not_analyzed" 
                  }, 
                  "idea" : { 
                    "type" : "boolean", 
                    "index" : "not_analyzed" 
                  }, 
                  "serviceProvider" : { 
                    "type" : "boolean", 
                    "index" : "not_analyzed" 
                  } 
                } 
              } 
            } 
            
            
            After inserting a few dozen sample records, I perform 
            some queries: 
            { "query" : { 
                  "bool" : { 
                      "must" : [ 
                          { "field" : { "active" : true } }, 
                          { "field" : { "idea" : false } }, 
                          { "field" : { "serviceProvider" : 
            true } } 
                        ] 
                    } 
                } 
            } 
            
            
            
            Any time I specify { "field" : { "idea" : false } } I 
            get 0 results, but when I change this to a term 
            search, I see my expected results. None of the other 
            boolean fields require this stipulation and work the 
            same way whether I specify term or field. In 0.18, a 
            term search was not required for the 'idea' property. 
            
            
            Is this expected behavior, or is something else 
            interfering with my mapping/queries? 

(Clinton Gormley) #6

Hi Jim

On Sat, 2012-05-05 at 11:06 -0700, James Cook wrote:

Thanks Clinton, I'm writing some basic unit tests to get a better
understanding of term vs. field. Is there any general purpose guidance
you can add to when one would use a field search/filter versus a term
search/filter. Is a term search meant to be run against only
properties that are not analyzed? I am just not clear on what the
diference is between the two.

First point: it's a field QUERY, not filter.

A 'term' filter or query does exact matching only. There is no analysis.

A 'text' query analyzes the search keywords, using the search_analyzer
which is defined for a particular field.

A 'field' query is similar to a text query, except it also takes the
Lucene Query Parser Syntax into account
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html

If you have a field which is marked as 'not_analyzed' or analyzer:
'keyword' then the 'text' or 'field' queries will take that into account
and will be the equivalent of a term query. However, the 'field' query's
lucene syntax may interfere with things, so best to only use that where
you really want it.

So lets say that you decide to go with 'term' clauses. Next question is:
should they be filters or queries? To answer that, ask yourself:

    should the results of this clause be boolean (ie either include
    or exclude results)  --> filter
    
    or 
    
    should they affect the scoring (the more terms that match, the
    higher the relevance) --> query

hth

clint

A term filter matches only exact terms, so no analysis is done.

A field QUERY (note the fitler do

-- jim

On Monday, April 30, 2012 1:59:12 AM UTC-4, Clinton Gormley wrote:
Hiya James

    This works for me too. Are you sure you're not using the
    'idea' field in 
    a different type with a different mapping? 
    
    On Sun, 2012-04-29 at 20:30 -0400, James Cook wrote: 
    > Unfortunately, it is a much more complicated structure than
    I have 
    > represented here and there are lots of different types.
    Perhaps I can 
    > get some advice regarding a couple aspects of the general
    problem? 
    >      1. If I have properties in other types also called
    'idea' can I 
    >         expect strange behavior if they are mapped in a
    manner 
    >         different than the property in this type? (Are
    property 
    >         mappings specific to 'type'?) 
    
    Right - you can get weird results.  Property mappings can
    interfere with 
    each other.  I'm unsure of the exact mechanism, but I if you
    name the 
    field unequivocally, then it should do the right thing, eg
    including the 
    type name in the field name 'mytype.idea' 
    
    Otherwise, it can resolve the name 'idea' to the wrong
    mapping. 
    
    >      1. When querying or filtering by a boolean value,
    should one use 
    >         a field or a term? 
    
    You should be able to use either. 'field' sees that the
    property is 
    mapped as boolean, and does the right thing. 
    
    >      1. I suppose one should almost never do a query for a
    boolean 
    >         value as it most likely would not impact score
    unless it was 
    >         in a 'should' clause. Should boolean conditions most
    often be 
    >         included in a filter and very rarely included in a
    query? 
    
    As you say, it depends whether you want them to affect the
    score or not. 
    That's the general rule.  Sometimes, depending on your query
    and how 
    often you use the exact combination of values in that query,
    it *may* be 
    more efficient to use a bool query rather than several
    filters.  But 
    whether it is better or not depends very much on what you're
    doing, and 
    you should test how each performs before you make a decision
    on that.   
    
    Also, see the 'bool' execution mode of the 'terms' filter,
    which (again, 
    depending on your exact query) may be more efficient still: 
    http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter.html 
    
    clint 
      
    > 
    > On Sun, Apr 29, 2012 at 12:47 PM, Shay Banon
    <kimchy@gmail.com> wrote: 
    >         Can you post a full recreation? This seems to 
    >         work: https://gist.github.com/2551823. 
    >         
    >         On Fri, Apr 27, 2012 at 4:20 PM, James Cook
    <jcook@pykl.com> 
    >         wrote: 
    >                 I just recently upgraded from 0.18 to 0.19.2
    and 
    >                 noticed this new behavior creep into my
    codebase, 
    >                 although I would have to consider it a bug. 
    >                 
    >                 
    >                 I have a mapping file with quite a few
    different 
    >                 properties, but here is a subset containing
    some 
    >                 boolean values: 
    >                 
    >                 
    >                 $ curl -XGET 
    >
    'http://localhost:9311/nep/ventures/_mapping?pretty=true' 
    >                 { 
    >                   "ventures" : { 
    >                     "_id" : { 
    >                       "index" : "not_analyzed" 
    >                     }, 
    >                     "properties" : { 
    >                       "active" : { 
    >                         "type" : "boolean", 
    >                         "index" : "not_analyzed" 
    >                       }, 
    >                       "idea" : { 
    >                         "type" : "boolean", 
    >                         "index" : "not_analyzed" 
    >                       }, 
    >                       "serviceProvider" : { 
    >                         "type" : "boolean", 
    >                         "index" : "not_analyzed" 
    >                       } 
    >                     } 
    >                   } 
    >                 } 
    >                 
    >                 
    >                 After inserting a few dozen sample records,
    I perform 
    >                 some queries: 
    >                 { "query" : { 
    >                       "bool" : { 
    >                           "must" : [ 
    >                               { "field" : { "active" :
    true } }, 
    >                               { "field" : { "idea" :
    false } }, 
    >                               { "field" :
    { "serviceProvider" : 
    >                 true } } 
    >                             ] 
    >                         } 
    >                     } 
    >                 } 
    >                 
    >                 
    >                 
    >                 Any time I specify { "field" : { "idea" :
    false } } I 
    >                 get 0 results, but when I change this to a
    term 
    >                 search, I see my expected results. None of
    the other 
    >                 boolean fields require this stipulation and
    work the 
    >                 same way whether I specify term or field. In
    0.18, a 
    >                 term search was not required for the 'idea'
    property. 
    >                 
    >                 
    >                 Is this expected behavior, or is something
    else 
    >                 interfering with my mapping/queries? 
    >         
    >         
    > 
    > 

(James Cook-3) #7

Great stuff. I rewrote most of my requests to use only filters. I need to
sort on a field, but mostly never need to leverage scoring.

Thanks a lot!

On Monday, May 7, 2012 4:30:22 AM UTC-4, Clinton Gormley wrote:

Hi Jim

On Sat, 2012-05-05 at 11:06 -0700, James Cook wrote:

Thanks Clinton, I'm writing some basic unit tests to get a better
understanding of term vs. field. Is there any general purpose guidance
you can add to when one would use a field search/filter versus a term
search/filter. Is a term search meant to be run against only
properties that are not analyzed? I am just not clear on what the
diference is between the two.

First point: it's a field QUERY, not filter.

A 'term' filter or query does exact matching only. There is no analysis.

A 'text' query analyzes the search keywords, using the search_analyzer
which is defined for a particular field.

A 'field' query is similar to a text query, except it also takes the
Lucene Query Parser Syntax into account
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html

If you have a field which is marked as 'not_analyzed' or analyzer:
'keyword' then the 'text' or 'field' queries will take that into account
and will be the equivalent of a term query. However, the 'field' query's
lucene syntax may interfere with things, so best to only use that where
you really want it.

So lets say that you decide to go with 'term' clauses. Next question is:
should they be filters or queries? To answer that, ask yourself:

    should the results of this clause be boolean (ie either include 
    or exclude results)  --> filter 
    
    or 
    
    should they affect the scoring (the more terms that match, the 
    higher the relevance) --> query 

hth

clint

A term filter matches only exact terms, so no analysis is done.

A field QUERY (note the fitler do

-- jim

On Monday, April 30, 2012 1:59:12 AM UTC-4, Clinton Gormley wrote:
Hiya James

    This works for me too. Are you sure you're not using the 
    'idea' field in 
    a different type with a different mapping? 
    
    On Sun, 2012-04-29 at 20:30 -0400, James Cook wrote: 
    > Unfortunately, it is a much more complicated structure than 
    I have 
    > represented here and there are lots of different types. 
    Perhaps I can 
    > get some advice regarding a couple aspects of the general 
    problem? 
    >      1. If I have properties in other types also called 
    'idea' can I 
    >         expect strange behavior if they are mapped in a 
    manner 
    >         different than the property in this type? (Are 
    property 
    >         mappings specific to 'type'?) 
    
    Right - you can get weird results.  Property mappings can 
    interfere with 
    each other.  I'm unsure of the exact mechanism, but I if you 
    name the 
    field unequivocally, then it should do the right thing, eg 
    including the 
    type name in the field name 'mytype.idea' 
    
    Otherwise, it can resolve the name 'idea' to the wrong 
    mapping. 
    
    >      1. When querying or filtering by a boolean value, 
    should one use 
    >         a field or a term? 
    
    You should be able to use either. 'field' sees that the 
    property is 
    mapped as boolean, and does the right thing. 
    
    >      1. I suppose one should almost never do a query for a 
    boolean 
    >         value as it most likely would not impact score 
    unless it was 
    >         in a 'should' clause. Should boolean conditions most 
    often be 
    >         included in a filter and very rarely included in a 
    query? 
    
    As you say, it depends whether you want them to affect the 
    score or not. 
    That's the general rule.  Sometimes, depending on your query 
    and how 
    often you use the exact combination of values in that query, 
    it *may* be 
    more efficient to use a bool query rather than several 
    filters.  But 
    whether it is better or not depends very much on what you're 
    doing, and 
    you should test how each performs before you make a decision 
    on that.   
    
    Also, see the 'bool' execution mode of the 'terms' filter, 
    which (again, 
    depending on your exact query) may be more efficient still: 

http://www.elasticsearch.org/guide/reference/query-dsl/terms-filter.html

    clint 
      
    > 
    > On Sun, Apr 29, 2012 at 12:47 PM, Shay Banon 
    <kimchy@gmail.com> wrote: 
    >         Can you post a full recreation? This seems to 
    >         work: https://gist.github.com/2551823. 
    >         
    >         On Fri, Apr 27, 2012 at 4:20 PM, James Cook 
    <jcook@pykl.com> 
    >         wrote: 
    >                 I just recently upgraded from 0.18 to 0.19.2 
    and 
    >                 noticed this new behavior creep into my 
    codebase, 
    >                 although I would have to consider it a bug. 
    >                 
    >                 
    >                 I have a mapping file with quite a few 
    different 
    >                 properties, but here is a subset containing 
    some 
    >                 boolean values: 
    >                 
    >                 
    >                 $ curl -XGET 
    > 
    'http://localhost:9311/nep/ventures/_mapping?pretty=true' 
    >                 { 
    >                   "ventures" : { 
    >                     "_id" : { 
    >                       "index" : "not_analyzed" 
    >                     }, 
    >                     "properties" : { 
    >                       "active" : { 
    >                         "type" : "boolean", 
    >                         "index" : "not_analyzed" 
    >                       }, 
    >                       "idea" : { 
    >                         "type" : "boolean", 
    >                         "index" : "not_analyzed" 
    >                       }, 
    >                       "serviceProvider" : { 
    >                         "type" : "boolean", 
    >                         "index" : "not_analyzed" 
    >                       } 
    >                     } 
    >                   } 
    >                 } 
    >                 
    >                 
    >                 After inserting a few dozen sample records, 
    I perform 
    >                 some queries: 
    >                 { "query" : { 
    >                       "bool" : { 
    >                           "must" : [ 
    >                               { "field" : { "active" : 
    true } }, 
    >                               { "field" : { "idea" : 
    false } }, 
    >                               { "field" : 
    { "serviceProvider" : 
    >                 true } } 
    >                             ] 
    >                         } 
    >                     } 
    >                 } 
    >                 
    >                 
    >                 
    >                 Any time I specify { "field" : { "idea" : 
    false } } I 
    >                 get 0 results, but when I change this to a 
    term 
    >                 search, I see my expected results. None of 
    the other 
    >                 boolean fields require this stipulation and 
    work the 
    >                 same way whether I specify term or field. In 
    0.18, a 
    >                 term search was not required for the 'idea' 
    property. 
    >                 
    >                 
    >                 Is this expected behavior, or is something 
    else 
    >                 interfering with my mapping/queries? 
    >         
    >         
    > 
    > 

(system) #8