Retaining case in a faceted search


(csh-2) #1

Is there a way to do faceted searches using the Search API AND
maintain case. For example...

curl -X POST "http://localhost:9200/automobiles/automobile/_search?
pretty=true&q=make:B*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...returns...

{
...
"facets" : {
"make" : {
...
"terms" : [ {
"term" : "bmw",
"count" : 1654
}, {
"term" : "buick",
"count" : 362
}, {
...
} ]
}
}

...but I want to retain the case ("BMW", "Buick").

Thanks in advance, Chuck


(Ivan Brusic) #2

Hi Chuck,

When faceting on strings, they should either be not analyzed
(preferred) or tokenized with a KeywordTokenizer. What is happening in
your case is the terms are being indexed as lowercase by the default
analyzer.

--
Ivan

On Mon, Feb 13, 2012 at 9:14 AM, csh chuck.han@gmail.com wrote:

Is there a way to do faceted searches using the Search API AND
maintain case. For example...

curl -X POST "http://localhost:9200/automobiles/automobile/_search?
pretty=true&q=make:B*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...returns...

{
...
"facets" : {
"make" : {
...
"terms" : [ {
"term" : "bmw",
"count" : 1654
}, {
"term" : "buick",
"count" : 362
}, {
...
} ]
}
}

...but I want to retain the case ("BMW", "Buick").

Thanks in advance, Chuck


(csh-2) #3

Thanks for the quick response, Ivan! Will look into how to do this
(don't tell me :-)), as I am an ES newbie...

On Feb 13, 9:33 am, Ivan Brusic i...@brusic.com wrote:

Hi Chuck,

When faceting on strings, they should either be not analyzed
(preferred) or tokenized with a KeywordTokenizer. What is happening in
your case is the terms are being indexed as lowercase by the default
analyzer.

--
Ivan

On Mon, Feb 13, 2012 at 9:14 AM, csh chuck....@gmail.com wrote:

Is there a way to do faceted searches using the Search API AND
maintain case. For example...

curl -X POST "http://localhost:9200/automobiles/automobile/_search?
pretty=true&q=make:B*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...returns...

{
...
"facets" : {
"make" : {
...
"terms" : [ {
"term" : "bmw",
"count" : 1654
}, {
"term" : "buick",
"count" : 362
}, {
...
} ]
}
}

...but I want to retain the case ("BMW", "Buick").

Thanks in advance, Chuck


(Shay Banon) #4

Just in case you did not find out how, you need to explicitly define the mapping for that field to set index to not_analyzed. SEt the mapping in the create index API (simplest) when you create the index.

On Monday, February 13, 2012 at 9:02 PM, csh wrote:

Thanks for the quick response, Ivan! Will look into how to do this
(don't tell me :-)), as I am an ES newbie...

On Feb 13, 9:33 am, Ivan Brusic <i...@brusic.com (http://brusic.com)> wrote:

Hi Chuck,

When faceting on strings, they should either be not analyzed
(preferred) or tokenized with a KeywordTokenizer. What is happening in
your case is the terms are being indexed as lowercase by the default
analyzer.

--
Ivan

On Mon, Feb 13, 2012 at 9:14 AM, csh <chuck....@gmail.com (http://gmail.com)> wrote:

Is there a way to do faceted searches using the Search API AND
maintain case. For example...

curl -X POST "http://localhost:9200/automobiles/automobile/_search?
pretty=true&q=make:B*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...returns...

{
...
"facets" : {
"make" : {
...
"terms" : [ {
"term" : "bmw",
"count" : 1654
}, {
"term" : "buick",
"count" : 362
}, {
...
} ]
}
}

...but I want to retain the case ("BMW", "Buick").

Thanks in advance, Chuck


(csh-2) #5

I'm not quite getting the results I expect: I think I'm indexing the
way you suggested...

curl -XPUT localhost:9200/cars?pretty=true -d '{"index" :
{"analysis" : {"analyzer" : {"default" : {"type" : "keyword"}}}}}'

...because after populating ES, the following query gives me the fully-
retained fields:

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

I can even do a query now in which I ask for all "makes" that end in
"n"...

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:*n" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...and I get the right result:

{
...
"facets" : {
"make" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "Aston Martin",
"count" : 2
}, {
"term" : "Nissan",
"count" : 1
} ]
}
}
}

However, if I ask for all "makes" that start with "a" or
"A" (q=make:A* or q=make:a*), I get no results (there should be
several--at least one as shown in the above example):

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:a*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

Is that a bug, or is there something I'm missing?

thanks in advance, Chuck

On Feb 13, 9:33 am, Ivan Brusic i...@brusic.com wrote:

Hi Chuck,

When faceting on strings, they should either be not analyzed
(preferred) or tokenized with a KeywordTokenizer. What is happening in
yourcaseis the terms are being indexed as lowercase by the default
analyzer.

--
Ivan

On Mon, Feb 13, 2012 at 9:14 AM, csh chuck....@gmail.com wrote:

Is there a way to dofacetedsearches using the Search API AND
maintaincase. For example...

curl -X POST "http://localhost:9200/automobiles/automobile/_search?
pretty=true&q=make:B*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...returns...

{
...
"facets" : {
"make" : {
...
"terms" : [ {
"term" : "bmw",
"count" : 1654
}, {
"term" : "buick",
"count" : 362
}, {
...
} ]
}
}

...but I want to retain thecase("BMW", "Buick").

Thanks in advance, Chuck


(csh-2) #6

Got it! Need to put the wildcard directive in explicitly:

curl -X POST "http://localhost:9200/cars/car/_search?pretty=true" -d
'{"size" : "0", "query": {"wildcard" : { "make" : "A*" }}, "facets" :
{"make" : { "terms" : {"field" : "make"} }}}'

And, as expected, the wildcard is case-sensitive...

thanks, Chuck

On Feb 14, 8:58 am, csh chuck....@gmail.com wrote:

I'm not quite getting the results I expect: I think I'm indexing the
way you suggested...

curl -XPUT localhost:9200/cars?pretty=true -d '{"index" :
{"analysis" : {"analyzer" : {"default" : {"type" : "keyword"}}}}}'

...because after populating ES, the following query gives me the fully-
retained fields:

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

I can even do a query now in which I ask for all "makes" that end in
"n"...

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:*n" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...and I get the right result:

{
...
"facets" : {
"make" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "Aston Martin",
"count" : 2
}, {
"term" : "Nissan",
"count" : 1
} ]
}
}

}

However, if I ask for all "makes" that start with "a" or
"A" (q=make:A* or q=make:a*), I get no results (there should be
several--at least one as shown in the above example):

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:a*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

Is that a bug, or is there something I'm missing?

thanks in advance, Chuck

On Feb 13, 9:33 am, Ivan Brusic i...@brusic.com wrote:

Hi Chuck,

When faceting on strings, they should either be not analyzed
(preferred) or tokenized with a KeywordTokenizer. What is happening in
yourcaseis the terms are being indexed as lowercase by the default
analyzer.

--
Ivan

On Mon, Feb 13, 2012 at 9:14 AM, csh chuck....@gmail.com wrote:

Is there a way to dofacetedsearches using the Search API AND
maintaincase. For example...

curl -X POST "http://localhost:9200/automobiles/automobile/_search?
pretty=true&q=make:B*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...returns...

{
...
"facets" : {
"make" : {
...
"terms" : [ {
"term" : "bmw",
"count" : 1654
}, {
"term" : "buick",
"count" : 362
}, {
...
} ]
}
}

...but I want to retain thecase("BMW", "Buick").

Thanks in advance, Chuck


(Shay Banon) #7

Note, what you are doing is storing all text fields using the keyword analyzer, I am not sure that its what you really want. Only use that on fields that you want to facet, possibly with multi field mapping.

On Tuesday, February 14, 2012 at 10:34 PM, csh wrote:

Got it! Need to put the wildcard directive in explicitly:

curl -X POST "http://localhost:9200/cars/car/_search?pretty=true" -d
'{"size" : "0", "query": {"wildcard" : { "make" : "A*" }}, "facets" :
{"make" : { "terms" : {"field" : "make"} }}}'

And, as expected, the wildcard is case-sensitive...

thanks, Chuck

On Feb 14, 8:58 am, csh <chuck....@gmail.com (http://gmail.com)> wrote:

I'm not quite getting the results I expect: I think I'm indexing the
way you suggested...

curl -XPUT localhost:9200/cars?pretty=true -d '{"index" :
{"analysis" : {"analyzer" : {"default" : {"type" : "keyword"}}}}}'

...because after populating ES, the following query gives me the fully-
retained fields:

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

I can even do a query now in which I ask for all "makes" that end in
"n"...

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:*n" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...and I get the right result:

{
...
"facets" : {
"make" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "Aston Martin",
"count" : 2
}, {
"term" : "Nissan",
"count" : 1
} ]
}
}

}

However, if I ask for all "makes" that start with "a" or
"A" (q=make:A* or q=make:a*), I get no results (there should be
several--at least one as shown in the above example):

curl -X POST "http://localhost:9200/cars/car/_search?
pretty=true&q=make:a*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

Is that a bug, or is there something I'm missing?

thanks in advance, Chuck

On Feb 13, 9:33 am, Ivan Brusic <i...@brusic.com (http://brusic.com)> wrote:

Hi Chuck,

When faceting on strings, they should either be not analyzed
(preferred) or tokenized with a KeywordTokenizer. What is happening in
yourcaseis the terms are being indexed as lowercase by the default
analyzer.

--
Ivan

On Mon, Feb 13, 2012 at 9:14 AM, csh <chuck....@gmail.com (http://gmail.com)> wrote:

Is there a way to dofacetedsearches using the Search API AND
maintaincase. For example...

curl -X POST "http://localhost:9200/automobiles/automobile/_search?
pretty=true&q=make:B*" -d '{"size" : "0", "facets" : {"make" :
{ "terms" : {"field" : "make"} }}}'

...returns...

{
...
"facets" : {
"make" : {
...
"terms" : [ {
"term" : "bmw",
"count" : 1654
}, {
"term" : "buick",
"count" : 362
}, {
...
} ]
}
}

...but I want to retain thecase("BMW", "Buick").

Thanks in advance, Chuck


(system) #8