Terms Filter Help


(Brandon Hilkert) #1

I'm indexing user records. Within a user's record, there are 2 arrays
holding connection IDs like this:

fb_connections: ["1213", 12312", ...]
linkedin_connections: ["asdfas", "asdfasdf,...]

I'm attempting to use a filter, to only include the people that I'm
connected to in the result set. This works fine for the "fb_connections"
query, but not for "linkedin_connections". Using a "query_string" filter,
brings back the results, but not terms. Here are the tests:

This does NOT work:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"linkedin_id": [
"KfotObi8Rj"
]
}
}
}
}
}

This works:
*
*
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"query": {
"query_string": {
"query": "KfotObi8Rj"
}
}
}
}
}
}

Facebook user IDs are large numbers, which we store as strings
("231234234345"), and linked in IDs are true strings, sometimes with
symbols ("KfotObi8Rj").

Is there a reason that a terms filter wouldn't work with this type of data?


(Tanguy) #2

I tried to reproduce, term filter works as expected. Maybe you specified a
wrong field name linkedin_id/linkedin_connections?

This works:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"linkedin_connections": [
"asdfas"
]
}
}
}
}
}

-- Tanguy
Twitter: @tlrx

Le mercredi 30 mai 2012 16:26:39 UTC+2, Brandon Hilkert a écrit :

I'm indexing user records. Within a user's record, there are 2 arrays
holding connection IDs like this:

fb_connections: ["1213", 12312", ...]
linkedin_connections: ["asdfas", "asdfasdf,...]

I'm attempting to use a filter, to only include the people that I'm
connected to in the result set. This works fine for the "fb_connections"
query, but not for "linkedin_connections". Using a "query_string" filter,
brings back the results, but not terms. Here are the tests:

This does NOT work:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"linkedin_id": [
"KfotObi8Rj"
]
}
}
}
}
}

This works:
*
*
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"query": {
"query_string": {
"query": "KfotObi8Rj"
}
}
}
}
}
}

Facebook user IDs are large numbers, which we store as strings
("231234234345"), and linked in IDs are true strings, sometimes with
symbols ("KfotObi8Rj").

Is there a reason that a terms filter wouldn't work with this type of data?


(Brandon Hilkert) #3

The field name "linkedin_id" is the user's associated ID. The array
"linkedin_connections" contains a user's linkedin connections. Thus, I'd
want to send the array of connections in as "terms" on the linkedin_id
field.

Actually, come to think of it...does the "terms" filter have to work on an
array field?

Since Linkedin_id is a string field, I want to only keep the results that
have one of the elements of the array "linkedin_connections" in the
"linkedin_id" field. Does this require that you use a "query" filter?

On Wednesday, May 30, 2012 10:38:39 AM UTC-4, Tanguy wrote:

I tried to reproduce, term filter works as expected. Maybe you specified a
wrong field name linkedin_id/linkedin_connections?

This works:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"linkedin_connections": [
"asdfas"
]
}
}
}
}
}

-- Tanguy
Twitter: @tlrx

Le mercredi 30 mai 2012 16:26:39 UTC+2, Brandon Hilkert a écrit :

I'm indexing user records. Within a user's record, there are 2 arrays
holding connection IDs like this:

fb_connections: ["1213", 12312", ...]
linkedin_connections: ["asdfas", "asdfasdf,...]

I'm attempting to use a filter, to only include the people that I'm
connected to in the result set. This works fine for the "fb_connections"
query, but not for "linkedin_connections". Using a "query_string" filter,
brings back the results, but not terms. Here are the tests:

This does NOT work:

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"linkedin_id": [
"KfotObi8Rj"
]
}
}
}
}
}

This works:
*
*
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"query": {
"query_string": {
"query": "KfotObi8Rj"
}
}
}
}
}
}

Facebook user IDs are large numbers, which we store as strings
("231234234345"), and linked in IDs are true strings, sometimes with
symbols ("KfotObi8Rj").

Is there a reason that a terms filter wouldn't work with this type of
data?


(Clinton Gormley) #4

Just make sure your linkedin_id is set to { index: not_analyzed}

c

On Wed, 2012-05-30 at 08:01 -0700, Brandon Hilkert wrote:

The field name "linkedin_id" is the user's associated ID. The array
"linkedin_connections" contains a user's linkedin connections. Thus,
I'd want to send the array of connections in as "terms" on the
linkedin_id field.

Actually, come to think of it...does the "terms" filter have to work
on an array field?

Since Linkedin_id is a string field, I want to only keep the results
that have one of the elements of the array "linkedin_connections" in
the "linkedin_id" field. Does this require that you use a "query"
filter?

On Wednesday, May 30, 2012 10:38:39 AM UTC-4, Tanguy wrote:
I tried to reproduce, term filter works as expected. Maybe you
specified a wrong field name linkedin_id/linkedin_connections?

    This works:
    {
      "query": {
        "filtered": {
          "query": {
            "match_all": {}
          },
          "filter": {
            "terms": {
              "linkedin_connections": [
                "asdfas"
              ]
            }
          }
        }
      }
    }
    
    
    -- Tanguy
    Twitter: @tlrx
    
    Le mercredi 30 mai 2012 16:26:39 UTC+2, Brandon Hilkert a
    écrit :
            I'm indexing user records. Within a user's record,
            there are 2 arrays holding connection IDs like this:
            
            
            fb_connections: ["1213", 12312", ...]
            linkedin_connections: ["asdfas", "asdfasdf,...]
            
            
            I'm attempting to use a filter, to only include the
            people that I'm connected to in the result set. This
            works fine for the "fb_connections" query, but not for
            "linkedin_connections". Using a "query_string" filter,
            brings back the results, but not terms. Here are the
            tests:
            
            
            This does NOT work:
            
            
            {
              "query": {
                "filtered": {
                  "query": {
                    "match_all": {}
                  },
                  "filter": {
                    "terms": {
                      "linkedin_id": [
                        "KfotObi8Rj"
                      ]
                    }
                  }
                }
              }
            }
            
            
            This works:
            
            
            {
              "query": {
                "filtered": {
                  "query": {
                    "match_all": {}
                  },
                  "filter": {
                    "query": {
                      "query_string": {
                        "query": "KfotObi8Rj"
                      }
                    }
                  }
                }
              }
            }
            
            
            Facebook user IDs are large numbers, which we store as
            strings ("231234234345"), and linked in IDs are true
            strings, sometimes with symbols ("KfotObi8Rj").
            
            
            Is there a reason that a terms filter wouldn't work
            with this type of data?

(Brandon Hilkert) #5

Marking as "not_analyzed", are you suggesting I use the "terms" filter, or
"query" filter?

On Wednesday, May 30, 2012 11:03:11 AM UTC-4, Clinton Gormley wrote:

Just make sure your linkedin_id is set to { index: not_analyzed}

c

On Wed, 2012-05-30 at 08:01 -0700, Brandon Hilkert wrote:

The field name "linkedin_id" is the user's associated ID. The array
"linkedin_connections" contains a user's linkedin connections. Thus,
I'd want to send the array of connections in as "terms" on the
linkedin_id field.

Actually, come to think of it...does the "terms" filter have to work
on an array field?

Since Linkedin_id is a string field, I want to only keep the results
that have one of the elements of the array "linkedin_connections" in
the "linkedin_id" field. Does this require that you use a "query"
filter?

On Wednesday, May 30, 2012 10:38:39 AM UTC-4, Tanguy wrote:
I tried to reproduce, term filter works as expected. Maybe you
specified a wrong field name linkedin_id/linkedin_connections?

    This works: 
    { 
      "query": { 
        "filtered": { 
          "query": { 
            "match_all": {} 
          }, 
          "filter": { 
            "terms": { 
              "linkedin_connections": [ 
                "asdfas" 
              ] 
            } 
          } 
        } 
      } 
    } 
    
    
    -- Tanguy 
    Twitter: @tlrx 
    
    Le mercredi 30 mai 2012 16:26:39 UTC+2, Brandon Hilkert a 
    écrit : 
            I'm indexing user records. Within a user's record, 
            there are 2 arrays holding connection IDs like this: 
            
            
            fb_connections: ["1213", 12312", ...] 
            linkedin_connections: ["asdfas", "asdfasdf,...] 
            
            
            I'm attempting to use a filter, to only include the 
            people that I'm connected to in the result set. This 
            works fine for the "fb_connections" query, but not for 
            "linkedin_connections". Using a "query_string" filter, 
            brings back the results, but not terms. Here are the 
            tests: 
            
            
            This does NOT work: 
            
            
            { 
              "query": { 
                "filtered": { 
                  "query": { 
                    "match_all": {} 
                  }, 
                  "filter": { 
                    "terms": { 
                      "linkedin_id": [ 
                        "KfotObi8Rj" 
                      ] 
                    } 
                  } 
                } 
              } 
            } 
            
            
            This works: 
            
            
            { 
              "query": { 
                "filtered": { 
                  "query": { 
                    "match_all": {} 
                  }, 
                  "filter": { 
                    "query": { 
                      "query_string": { 
                        "query": "KfotObi8Rj" 
                      } 
                    } 
                  } 
                } 
              } 
            } 
            
            
            Facebook user IDs are large numbers, which we store as 
            strings ("231234234345"), and linked in IDs are true 
            strings, sometimes with symbols ("KfotObi8Rj"). 
            
            
            Is there a reason that a terms filter wouldn't work 
            with this type of data? 

(Brandon Hilkert) #6

Turns out, it's down casing everything, so looking for the proper casing of
the ID didn't return any results.

The query below, searches for both the regular case, along with the
down-cased version and the down cased gets the result. Is there a way to
turn this off so it doesn't do that?

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"terms": {
"linkedin_id": [
"axXd7QNb8n",
"axxd7qnb8n"
]
}
}
}
}
}

On Wednesday, May 30, 2012 11:12:06 AM UTC-4, Brandon Hilkert wrote:

Marking as "not_analyzed", are you suggesting I use the "terms" filter, or
"query" filter?

On Wednesday, May 30, 2012 11:03:11 AM UTC-4, Clinton Gormley wrote:

Just make sure your linkedin_id is set to { index: not_analyzed}

c

On Wed, 2012-05-30 at 08:01 -0700, Brandon Hilkert wrote:

The field name "linkedin_id" is the user's associated ID. The array
"linkedin_connections" contains a user's linkedin connections. Thus,
I'd want to send the array of connections in as "terms" on the
linkedin_id field.

Actually, come to think of it...does the "terms" filter have to work
on an array field?

Since Linkedin_id is a string field, I want to only keep the results
that have one of the elements of the array "linkedin_connections" in
the "linkedin_id" field. Does this require that you use a "query"
filter?

On Wednesday, May 30, 2012 10:38:39 AM UTC-4, Tanguy wrote:
I tried to reproduce, term filter works as expected. Maybe you
specified a wrong field name linkedin_id/linkedin_connections?

    This works: 
    { 
      "query": { 
        "filtered": { 
          "query": { 
            "match_all": {} 
          }, 
          "filter": { 
            "terms": { 
              "linkedin_connections": [ 
                "asdfas" 
              ] 
            } 
          } 
        } 
      } 
    } 
    
    
    -- Tanguy 
    Twitter: @tlrx 
    
    Le mercredi 30 mai 2012 16:26:39 UTC+2, Brandon Hilkert a 
    écrit : 
            I'm indexing user records. Within a user's record, 
            there are 2 arrays holding connection IDs like this: 
            
            
            fb_connections: ["1213", 12312", ...] 
            linkedin_connections: ["asdfas", "asdfasdf,...] 
            
            
            I'm attempting to use a filter, to only include the 
            people that I'm connected to in the result set. This 
            works fine for the "fb_connections" query, but not for 
            "linkedin_connections". Using a "query_string" filter, 
            brings back the results, but not terms. Here are the 
            tests: 
            
            
            This does NOT work: 
            
            
            { 
              "query": { 
                "filtered": { 
                  "query": { 
                    "match_all": {} 
                  }, 
                  "filter": { 
                    "terms": { 
                      "linkedin_id": [ 
                        "KfotObi8Rj" 
                      ] 
                    } 
                  } 
                } 
              } 
            } 
            
            
            This works: 
            
            
            { 
              "query": { 
                "filtered": { 
                  "query": { 
                    "match_all": {} 
                  }, 
                  "filter": { 
                    "query": { 
                      "query_string": { 
                        "query": "KfotObi8Rj" 
                      } 
                    } 
                  } 
                } 
              } 
            } 
            
            
            Facebook user IDs are large numbers, which we store as 
            strings ("231234234345"), and linked in IDs are true 
            strings, sometimes with symbols ("KfotObi8Rj"). 
            
            
            Is there a reason that a terms filter wouldn't work 
            with this type of data? 

(Clinton Gormley) #7

On Wed, 2012-05-30 at 08:45 -0700, Brandon Hilkert wrote:

Turns out, it's down casing everything, so looking for the proper
casing of the ID didn't return any results.

The query below, searches for both the regular case, along with the
down-cased version and the down cased gets the result. Is there a way
to turn this off so it doesn't do that?

I repeat :slight_smile:

Just make sure your linkedin_id is set to { index: not_analyzed}

c


(Brandon Hilkert) #8

Got it. Thanks!

Is there a way to see on ES if the field is analyzed?

I'm using the Tire Ruby gem and passing in :index => :not_analyzed, but I'm
not confident it's actually making the setting, because I'm not getting the
results I expect from my program.

On Wednesday, May 30, 2012 11:47:53 AM UTC-4, Clinton Gormley wrote:

On Wed, 2012-05-30 at 08:45 -0700, Brandon Hilkert wrote:

Turns out, it's down casing everything, so looking for the proper
casing of the ID didn't return any results.

The query below, searches for both the regular case, along with the
down-cased version and the down cased gets the result. Is there a way
to turn this off so it doesn't do that?

I repeat :slight_smile:

Just make sure your linkedin_id is set to { index: not_analyzed}

c


(Clinton Gormley) #9

On Wed, 2012-05-30 at 10:07 -0700, Brandon Hilkert wrote:

Got it. Thanks!

Is there a way to see on ES if the field is analyzed?

http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

c

I'm using the Tire Ruby gem and passing in :index => :not_analyzed,
but I'm not confident it's actually making the setting, because I'm
not getting the results I expect from my program.

On Wednesday, May 30, 2012 11:47:53 AM UTC-4, Clinton Gormley wrote:
On Wed, 2012-05-30 at 08:45 -0700, Brandon Hilkert wrote:
> Turns out, it's down casing everything, so looking for the
proper
> casing of the ID didn't return any results.

    > The query below, searches for both the regular case, along
    with the 
    > down-cased version and the down cased gets the result. Is
    there a way 
    > to turn this off so it doesn't do that? 
    
    I repeat :) 
    
    Just make sure your linkedin_id is set to { index:
    not_analyzed} 
    
    c 

(Brandon Hilkert) #10

Did that. If there's nothing about analyzer, should I assume it's taking
the defaults of being analyzed?

  • }

  • fb_user_id: {

    • type: string
      }
  • linkedin_id: {

    • type: string
      }

On Wednesday, May 30, 2012 1:09:57 PM UTC-4, Clinton Gormley wrote:

On Wed, 2012-05-30 at 10:07 -0700, Brandon Hilkert wrote:

Got it. Thanks!

Is there a way to see on ES if the field is analyzed?

http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

c

I'm using the Tire Ruby gem and passing in :index => :not_analyzed,
but I'm not confident it's actually making the setting, because I'm
not getting the results I expect from my program.

On Wednesday, May 30, 2012 11:47:53 AM UTC-4, Clinton Gormley wrote:
On Wed, 2012-05-30 at 08:45 -0700, Brandon Hilkert wrote:
> Turns out, it's down casing everything, so looking for the
proper
> casing of the ID didn't return any results.

    > The query below, searches for both the regular case, along 
    with the 
    > down-cased version and the down cased gets the result. Is 
    there a way 
    > to turn this off so it doesn't do that? 
    
    I repeat :) 
    
    Just make sure your linkedin_id is set to { index: 
    not_analyzed} 
    
    c 

(Clinton Gormley) #11

On Wed, 2012-05-30 at 10:13 -0700, Brandon Hilkert wrote:

Did that. If there's nothing about analyzer, should I assume it's
taking the defaults of being analyzed?

Yes, that's correct. Anything that isn't explicitly mentioned in the
mapping is the default setting, and strings are analyzed by default

clint

  * }
  * fb_user_id: {
          * type: string
    }
  * linkedin_id: {
          * type: string
    }

On Wednesday, May 30, 2012 1:09:57 PM UTC-4, Clinton Gormley wrote:
On Wed, 2012-05-30 at 10:07 -0700, Brandon Hilkert wrote:
> Got it. Thanks!
>
>
> Is there a way to see on ES if the field is analyzed?

    http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html 
    
    c 
    
    > 
    > 
    > I'm using the Tire Ruby gem and passing in :index
    => :not_analyzed, 
    > but I'm not confident it's actually making the setting,
    because I'm 
    > not getting the results I expect from my program. 
    > 
    > On Wednesday, May 30, 2012 11:47:53 AM UTC-4, Clinton
    Gormley wrote: 
    >         On Wed, 2012-05-30 at 08:45 -0700, Brandon Hilkert
    wrote: 
    >         > Turns out, it's down casing everything, so looking
    for the 
    >         proper 
    >         > casing of the ID didn't return any results. 
    >         
    >         > The query below, searches for both the regular
    case, along 
    >         with the 
    >         > down-cased version and the down cased gets the
    result. Is 
    >         there a way 
    >         > to turn this off so it doesn't do that? 
    >         
    >         I repeat :) 
    >         
    >         Just make sure your linkedin_id is set to { index: 
    >         not_analyzed} 
    >         
    >         c 
    >         
    >         
    >         

(system) #12