Possible to lowercase filter _index and _id fields


(ppearcy) #1

I like that the internal _index and _id fields are available to search
on. However, I am moving from a domain where we could do case
insensitive searches on these fields. In order to do this, I am trying
to add a lowercase filter analyzer on these fields, but with no luck.

After creating the index, I have tried the following:

curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping' -d '
{
"tweet" : {
"_index" : { "enabled" : true, "analyzer" : "lowercase" }
}
}
'
curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping' -d '
{
"tweet" : {
"_index" : { "enabled" : true },
"properties" : {
"_index" : {"type" : "string", "analyzer" :
"sortable_tokenizer" }
}
}
}
'

Neither seem to work.

If analysis is not available on _id/_index fields, not the a big deal,
I just need to disable the ES built in ones and add use my own.

Thanks,
Paul


(Shay Banon) #2

Hi, yea, those are not analyzed and you can't control it (intentionally,
since analysis is can potentially break it up to more than a single term).

On Fri, Sep 17, 2010 at 7:11 PM, Paul ppearcy@gmail.com wrote:

I like that the internal _index and _id fields are available to search
on. However, I am moving from a domain where we could do case
insensitive searches on these fields. In order to do this, I am trying
to add a lowercase filter analyzer on these fields, but with no luck.

After creating the index, I have tried the following:

curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping' -d '
{
"tweet" : {
"_index" : { "enabled" : true, "analyzer" : "lowercase" }
}
}
'
curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping' -d '
{
"tweet" : {
"_index" : { "enabled" : true },
"properties" : {
"_index" : {"type" : "string", "analyzer" :
"sortable_tokenizer" }
}
}
}
'

Neither seem to work.

If analysis is not available on _id/_index fields, not the a big deal,
I just need to disable the ES built in ones and add use my own.

Thanks,
Paul


(ppearcy) #3

Cool, thx.

Will create my own versions.

On Sep 17, 11:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi, yea, those are not analyzed and you can't control it (intentionally,
since analysis is can potentially break it up to more than a single term).

On Fri, Sep 17, 2010 at 7:11 PM, Paul ppea...@gmail.com wrote:

I like that the internal _index and _id fields are available to search
on. However, I am moving from a domain where we could do case
insensitive searches on these fields. In order to do this, I am trying
to add a lowercase filter analyzer on these fields, but with no luck.

After creating the index, I have tried the following:

curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d '
{
"tweet" : {
"_index" : { "enabled" : true, "analyzer" : "lowercase" }
}
}
'
curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d '
{
"tweet" : {
"_index" : { "enabled" : true },
"properties" : {
"_index" : {"type" : "string", "analyzer" :
"sortable_tokenizer" }
}
}
}
'

Neither seem to work.

If analysis is not available on _id/_index fields, not the a big deal,
I just need to disable the ES built in ones and add use my own.

Thanks,
Paul


(Shay Banon) #4

Why not lowercase before hand, so you won't store extra data in the index?

On Fri, Sep 17, 2010 at 7:33 PM, Paul ppearcy@gmail.com wrote:

Cool, thx.

Will create my own versions.

On Sep 17, 11:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi, yea, those are not analyzed and you can't control it (intentionally,
since analysis is can potentially break it up to more than a single
term).

On Fri, Sep 17, 2010 at 7:11 PM, Paul ppea...@gmail.com wrote:

I like that the internal _index and _id fields are available to search
on. However, I am moving from a domain where we could do case
insensitive searches on these fields. In order to do this, I am trying
to add a lowercase filter analyzer on these fields, but with no luck.

After creating the index, I have tried the following:

curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d '
{
"tweet" : {
"_index" : { "enabled" : true, "analyzer" : "lowercase" }
}
}
'
curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d '
{
"tweet" : {
"_index" : { "enabled" : true },
"properties" : {
"_index" : {"type" : "string", "analyzer" :
"sortable_tokenizer" }
}
}
}
'

Neither seem to work.

If analysis is not available on _id/_index fields, not the a big deal,
I just need to disable the ES built in ones and add use my own.

Thanks,
Paul


(ppearcy) #5

I'd like to keep the searches on id and index case insensitive to
avoid any confusion and map as closely to the system we are
replacing.

So, the two alternatives are:

  • Lower case on the indexing side and on the search side
  • Add lowercase keyword analyzer and have my own versions of these
    fields

If I then wanted to do case sensitive searches, I would need to add
logic to the search side lower casing to only target specific fields,
which gets a little ugly.

The extra _id field (no way to disable this and _type, right?) adds a
little extra bloat to the index, but I'll take it to get the search
case-insensitive.

Maybe it makes sense to allow a filter to be applied to the internal
queryable fields?

Thanks,
Paul

On Sep 17, 11:35 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Why not lowercase before hand, so you won't store extra data in the index?

On Fri, Sep 17, 2010 at 7:33 PM, Paul ppea...@gmail.com wrote:

Cool, thx.

Will create my own versions.

On Sep 17, 11:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi, yea, those are not analyzed and you can't control it (intentionally,
since analysis is can potentially break it up to more than a single
term).

On Fri, Sep 17, 2010 at 7:11 PM, Paul ppea...@gmail.com wrote:

I like that the internal _index and _id fields are available to search
on. However, I am moving from a domain where we could do case
insensitive searches on these fields. In order to do this, I am trying
to add a lowercase filter analyzer on these fields, but with no luck.

After creating the index, I have tried the following:

curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d'
{
"tweet" : {
"_index" : { "enabled" : true, "analyzer" : "lowercase" }
}
}
'
curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d'
{
"tweet" : {
"_index" : { "enabled" : true },
"properties" : {
"_index" : {"type" : "string", "analyzer" :
"sortable_tokenizer" }
}
}
}
'

Neither seem to work.

If analysis is not available on _id/_index fields, not the a big deal,
I just need to disable the ES built in ones and add use my own.

Thanks,
Paul


(Shay Banon) #6

The _id and _type are required. I understand what you are trying to do, in
this case, I suggest you go with adding two fields lowercasing it.

On Fri, Sep 17, 2010 at 9:23 PM, Paul ppearcy@gmail.com wrote:

I'd like to keep the searches on id and index case insensitive to
avoid any confusion and map as closely to the system we are
replacing.

So, the two alternatives are:

  • Lower case on the indexing side and on the search side
  • Add lowercase keyword analyzer and have my own versions of these
    fields

If I then wanted to do case sensitive searches, I would need to add
logic to the search side lower casing to only target specific fields,
which gets a little ugly.

The extra _id field (no way to disable this and _type, right?) adds a
little extra bloat to the index, but I'll take it to get the search
case-insensitive.

Maybe it makes sense to allow a filter to be applied to the internal
queryable fields?

Thanks,
Paul

On Sep 17, 11:35 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Why not lowercase before hand, so you won't store extra data in the
index?

On Fri, Sep 17, 2010 at 7:33 PM, Paul ppea...@gmail.com wrote:

Cool, thx.

Will create my own versions.

On Sep 17, 11:18 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi, yea, those are not analyzed and you can't control it
(intentionally,

since analysis is can potentially break it up to more than a single
term).

On Fri, Sep 17, 2010 at 7:11 PM, Paul ppea...@gmail.com wrote:

I like that the internal _index and _id fields are available to
search

on. However, I am moving from a domain where we could do case
insensitive searches on these fields. In order to do this, I am
trying

to add a lowercase filter analyzer on these fields, but with no
luck.

After creating the index, I have tried the following:

curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d'
{
"tweet" : {
"_index" : { "enabled" : true, "analyzer" : "lowercase" }
}
}
'
curl -XPUT 'http://localhost:9201/twitter/tweet/_mapping'-d'
{
"tweet" : {
"_index" : { "enabled" : true },
"properties" : {
"_index" : {"type" : "string", "analyzer" :
"sortable_tokenizer" }
}
}
}
'

Neither seem to work.

If analysis is not available on _id/_index fields, not the a big
deal,

I just need to disable the ES built in ones and add use my own.

Thanks,
Paul


(system) #7