Cookbook Entry - Case insensitive array search


(James Cook-3) #1

This is actually a legitimate request for help from the group, but I am
writing it as a cookbook entry with the thought that it could eventually
make its way into a cookbook section of the elasticsearch website which
could be the basis for some user documentation to supplement the reference
documentation that currently exists. I would like the group to do a few
things:

  1. Correct any bad descriptions. Be picky about terminology.
  2. Offer alternative solutions
  3. Offer up any other comments

Once the example is accurate and alternatives are vetted, this example
could then be added to a pull request for a cookbook section of the site.
*
*
Overview

A JSON object may include an array of strings by which it will need to be
queried. This string array may be a list of tags assigned to a blog post,
or in our example a list of roles assigned to a user. For our example, we
want to supply multiple roles and return a hit if any of the roles supplied
matches the roles assigned to the user. Oh yeah, we don't want to consider
case when performing our search.

Setup

Because we want case insensitivity in our search, we have to initialize a
custom analyzer for the roles field. The custom analyzer,
"lowercase_keyword" is defined in the elasticsearch.yml file.

index :
analysis :
analyzer :
lowercase_keyword :
type : custom
tokenizer : keyword
filter : [lowercase]

Create the index and add the mapping file:

curl -XPOST localhost:9200/cookbook -d '{
"mappings" : {
"profiles" : {
"properties" : {
"username":{"type": "string", "index": "not_analyzed"},
"roles":{"type": "string", "index": "analyzed",
"analyzer":"lowercase_keyword"}
}
}
}
}'

Test Data

curl -XPUT 'localhost:9200/cookbook/profiles/1' -d '
{ "username" : "fred", "roles" : ["ROLE_ADMIN"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/2' -d '
{ "username" : "wilma", "roles" : ["ROLE_WIFE"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/3' -d '
{ "username" : "barney", "roles" : ["ROLE_USER"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/4' -d '
{ "username" : "betty", "roles" : ["ROLE_WIFE","role_user"] }'

Querying

Because of the case insensitive use case, our search must be a query and
our criteria must be analyzed by the same lowercase_keyword tokenizer by
which our documents were indexed.

Example 1
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query": {
"field": {
"roles": "role_wife"
}
}
}'

This satisfies a query for a single role, however multiple roles will have
to be conjoined using an OR clause.

Example 2
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query": {
"field": {
"roles": "role_admin OR role_USER"
}
}
}'

Filtering

If we were to permit ourselves some pre-processing of the search criteria,
we can simulate the results of a query using a straight filter approach.
The following filter provides the same results as in Example 2 above.
However, the search criteria must be lowercased in order to match the case
for the respective terms that have been processed with the
lowercase_keyword analyzer.

Example 3
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query":{
"constant_score":{
"filter":{
"or":[
{ "term": { "roles" : "role_admin" } },
{ "term": { "roles" : "role_user" } }
]
}
}
}
}'


(Shay Banon) #2

Looks great!, I would add also a sample of how to configure the analyzer
also as part of the create index request.

On Mon, May 21, 2012 at 5:25 AM, James Cook jcook@pykl.com wrote:

This is actually a legitimate request for help from the group, but I am
writing it as a cookbook entry with the thought that it could eventually
make its way into a cookbook section of the elasticsearch website which
could be the basis for some user documentation to supplement the reference
documentation that currently exists. I would like the group to do a few
things:

  1. Correct any bad descriptions. Be picky about terminology.
  2. Offer alternative solutions
  3. Offer up any other comments

Once the example is accurate and alternatives are vetted, this example
could then be added to a pull request for a cookbook section of the site.
*
*
Overview

A JSON object may include an array of strings by which it will need to be
queried. This string array may be a list of tags assigned to a blog post,
or in our example a list of roles assigned to a user. For our example, we
want to supply multiple roles and return a hit if any of the roles supplied
matches the roles assigned to the user. Oh yeah, we don't want to consider
case when performing our search.

Setup

Because we want case insensitivity in our search, we have to initialize a
custom analyzer for the roles field. The custom analyzer,
"lowercase_keyword" is defined in the elasticsearch.yml file.

index :
analysis :
analyzer :
lowercase_keyword :
type : custom
tokenizer : keyword
filter : [lowercase]

Create the index and add the mapping file:

curl -XPOST localhost:9200/cookbook -d '{
"mappings" : {
"profiles" : {
"properties" : {
"username":{"type": "string", "index": "not_analyzed"},
"roles":{"type": "string", "index": "analyzed",
"analyzer":"lowercase_keyword"}
}
}
}
}'

Test Data

curl -XPUT 'localhost:9200/cookbook/profiles/1' -d '
{ "username" : "fred", "roles" : ["ROLE_ADMIN"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/2' -d '
{ "username" : "wilma", "roles" : ["ROLE_WIFE"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/3' -d '
{ "username" : "barney", "roles" : ["ROLE_USER"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/4' -d '
{ "username" : "betty", "roles" : ["ROLE_WIFE","role_user"] }'

Querying

Because of the case insensitive use case, our search must be a query and
our criteria must be analyzed by the same lowercase_keyword tokenizer by
which our documents were indexed.

Example 1
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query": {
"field": {
"roles": "role_wife"
}
}
}'

This satisfies a query for a single role, however multiple roles will have
to be conjoined using an OR clause.

Example 2
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query": {
"field": {
"roles": "role_admin OR role_USER"
}
}
}'

Filtering

If we were to permit ourselves some pre-processing of the search criteria,
we can simulate the results of a query using a straight filter approach.
The following filter provides the same results as in Example 2 above.
However, the search criteria must be lowercased in order to match the case
for the respective terms that have been processed with the
lowercase_keyword analyzer.

Example 3
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query":{
"constant_score":{
"filter":{
"or":[
{ "term": { "roles" : "role_admin" } },
{ "term": { "roles" : "role_user" } }
]
}
}
}
}'


(James Cook-3) #3

Perhaps someone with more experience updating the site could issue a pull
request for a cookbook section? Perhaps these could be some sections?

  • Configuration
    • OS
    • Java
    • ElasticSearch
  • Mapping
  • Index
  • Query and Filter
  • Facets
  • Percolator
  • Integration
    • Amazon
    • RabbitMQ
    • MongoDB
    • Rivers

Then it's just a matter of working our way backward through the group
messages to add a cookbook entry for each Q&A type posting. Heck, if you
just search for Clinton's posts (http://goo.gl/TJHcq) you'll end up with
hundreds of good cookbook candidates.

Jim Cook
jcook@pykl.com

Pykl Studios http://pykl.com/
1000 Creekside Plaza
Gahanna, OH 43230
phone +1 614 398 3636tollfree +1 855 FOR PYKLskype jcook.pyklgtalk
jcook@pykl.com

On Wed, May 23, 2012 at 5:07 PM, Shay Banon kimchy@gmail.com wrote:

Looks great!, I would add also a sample of how to configure the analyzer
also as part of the create index request.

On Mon, May 21, 2012 at 5:25 AM, James Cook jcook@pykl.com wrote:

This is actually a legitimate request for help from the group, but I am
writing it as a cookbook entry with the thought that it could eventually
make its way into a cookbook section of the elasticsearch website which
could be the basis for some user documentation to supplement the reference
documentation that currently exists. I would like the group to do a few
things:

  1. Correct any bad descriptions. Be picky about terminology.
  2. Offer alternative solutions
  3. Offer up any other comments

Once the example is accurate and alternatives are vetted, this example
could then be added to a pull request for a cookbook section of the site.
*
*
Overview

A JSON object may include an array of strings by which it will need to be
queried. This string array may be a list of tags assigned to a blog post,
or in our example a list of roles assigned to a user. For our example, we
want to supply multiple roles and return a hit if any of the roles supplied
matches the roles assigned to the user. Oh yeah, we don't want to consider
case when performing our search.

Setup

Because we want case insensitivity in our search, we have to initialize a
custom analyzer for the roles field. The custom analyzer,
"lowercase_keyword" is defined in the elasticsearch.yml file.

index :
analysis :
analyzer :
lowercase_keyword :
type : custom
tokenizer : keyword
filter : [lowercase]

Create the index and add the mapping file:

curl -XPOST localhost:9200/cookbook -d '{
"mappings" : {
"profiles" : {
"properties" : {
"username":{"type": "string", "index": "not_analyzed"},
"roles":{"type": "string", "index": "analyzed",
"analyzer":"lowercase_keyword"}
}
}
}
}'

Test Data

curl -XPUT 'localhost:9200/cookbook/profiles/1' -d '
{ "username" : "fred", "roles" : ["ROLE_ADMIN"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/2' -d '
{ "username" : "wilma", "roles" : ["ROLE_WIFE"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/3' -d '
{ "username" : "barney", "roles" : ["ROLE_USER"] }'
curl -XPUT 'localhost:9200/cookbook/profiles/4' -d '
{ "username" : "betty", "roles" : ["ROLE_WIFE","role_user"] }'

Querying

Because of the case insensitive use case, our search must be a query and
our criteria must be analyzed by the same lowercase_keyword tokenizer by
which our documents were indexed.

Example 1
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query": {
"field": {
"roles": "role_wife"
}
}
}'

This satisfies a query for a single role, however multiple roles will
have to be conjoined using an OR clause.

Example 2
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query": {
"field": {
"roles": "role_admin OR role_USER"
}
}
}'

Filtering

If we were to permit ourselves some pre-processing of the search
criteria, we can simulate the results of a query using a straight filter
approach. The following filter provides the same results as in Example 2
above. However, the search criteria must be lowercased in order to match
the case for the respective terms that have been processed with the
lowercase_keyword analyzer.

Example 3
curl -XPOST 'localhost:9200/cookbook/profiles/_search' -d '
{
"query":{
"constant_score":{
"filter":{
"or":[
{ "term": { "roles" : "role_admin" } },
{ "term": { "roles" : "role_user" } }
]
}
}
}
}'


(system) #4