Text search sees "index":"no" fields


(Chris Berkhout) #1

Hi,

I've got a document type, currently with all fields, except some IDs,
set to "index":"no" (to make them not searchable).
My mappings look like this: https://gist.github.com/18284b28a2a7a4090942

However, I'm getting text search matches based on fields like "title_en".

Any ideas why? Do I need to set something else to make a field not searchable?

Cheers,
Chris

PS. I do ultimately want to make certain text fields searchable, I
just need to control with ones.


(Clinton Gormley) #2

Hi Chris

I've got a document type, currently with all fields, except some IDs,
set to "index":"no" (to make them not searchable).
My mappings look like this: https://gist.github.com/18284b28a2a7a4090942

However, I'm getting text search matches based on fields like "title_en".

Any ideas why? Do I need to set something else to make a field not searchable?

I tried this out, and it seems that your matches are coming from the
_all field, rather than the field "title_en" itself.

You can change this behaviour by adding "include_in_all: false" to those
fields.

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"document" : {
"properties" : {
"title_cn" : {
"index" : "no",
"type" : "string",
"include_in_all" : false
},
"source_ids" : {
"type" : "string"
}
}
}
}
}
'

This smells like a bug to me. If you specify "index: no" for a field,
it should also disable "include_in_all" by default.

I've opened an issue:

clint


(Chris Berkhout) #3

Thanks Clint!

Sounds like the issue will get dealt with in the end, and there's an
easy workaround for now, so I'm very happy!

Cheers,
Chris

On Sat, Jul 2, 2011 at 5:38 PM, Clinton Gormley clinton@iannounce.co.uk wrote:

Hi Chris

I've got a document type, currently with all fields, except some IDs,
set to "index":"no" (to make them not searchable).
My mappings look like this: https://gist.github.com/18284b28a2a7a4090942

However, I'm getting text search matches based on fields like "title_en".

Any ideas why? Do I need to set something else to make a field not searchable?

I tried this out, and it seems that your matches are coming from the
_all field, rather than the field "title_en" itself.

You can change this behaviour by adding "include_in_all: false" to those
fields.

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"document" : {
"properties" : {
"title_cn" : {
"index" : "no",
"type" : "string",
"include_in_all" : false
},
"source_ids" : {
"type" : "string"
}
}
}
}
}
'

This smells like a bug to me. If you specify "index: no" for a field,
it should also disable "include_in_all" by default.

I've opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1087

clint


(Shay Banon) #4

Yea, thats the behavior now, when you set "index" to "no", then it will still default to be included in all, unless you explicitly set the indclude_in_all to false. There are cases where you want that, the question is what the default should be.

I agree that this is a more sensible default compared to what we have today. i.e., when setting "index" to "no", don't include it in _all by default (unless explicitly set to be included). Its a backward change, I am up for it, but lets hear from other people what they think...

On Saturday, July 2, 2011 at 2:29 PM, Chris Berkhout wrote:

Thanks Clint!

Sounds like the issue will get dealt with in the end, and there's an
easy workaround for now, so I'm very happy!

Cheers,
Chris

On Sat, Jul 2, 2011 at 5:38 PM, Clinton Gormley <clinton@iannounce.co.uk (mailto:clinton@iannounce.co.uk)> wrote:

Hi Chris

I've got a document type, currently with all fields, except some IDs,
set to "index":"no" (to make them not searchable).
My mappings look like this: https://gist.github.com/18284b28a2a7a4090942

However, I'm getting text search matches based on fields like "title_en".

Any ideas why? Do I need to set something else to make a field not searchable?

I tried this out, and it seems that your matches are coming from the
_all field, rather than the field "title_en" itself.

You can change this behaviour by adding "include_in_all: false" to those
fields.

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"document" : {
"properties" : {
"title_cn" : {
"index" : "no",
"type" : "string",
"include_in_all" : false
},
"source_ids" : {
"type" : "string"
}
}
}
}
}
'

This smells like a bug to me. If you specify "index: no" for a field,
it should also disable "include_in_all" by default.

I've opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1087

clint


(Chris Berkhout) #5

I agree that changing the default probably makes sense.

However, I was originally going on this:
"index: Set to analyzed for the field to be indexed and searchable after
being broken down into token using an analyzer. not_analyzed means that its
still searchable, but does not go through any analysis process or broken
down into tokens. no means that it won’t be searchable at all. Defaults to
analyzed."
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

So I'm a little surprised if the include_in_all default is the root issue.

If include_in_all is changed, won't it still be possible to search it by
specifying the field? I think it's worthwhile to be able to make a field
properly non-searchable.

Cheers,
Chris

On Sat, Jul 2, 2011 at 9:05 PM, Shay Banon shay.banon@elasticsearch.com
wrote:

Yea, thats the behavior now, when you set "index" to "no", then it will
still default to be included in all, unless you explicitly set the
indclude_in_all to false. There are cases where you want that, the
question
is what the default should be.
I agree that this is a more sensible default compared to what we have
today.
i.e., when setting "index" to "no", don't include it in _all by default
(unless explicitly set to be included). Its a backward change, I am up for
it, but lets hear from other people what they think...

On Saturday, July 2, 2011 at 2:29 PM, Chris Berkhout wrote:

Thanks Clint!

Sounds like the issue will get dealt with in the end, and there's an
easy workaround for now, so I'm very happy!

Cheers,
Chris

On Sat, Jul 2, 2011 at 5:38 PM, Clinton Gormley clinton@iannounce.co.uk
wrote:

Hi Chris

I've got a document type, currently with all fields, except some IDs,
set to "index":"no" (to make them not searchable).
My mappings look like this: https://gist.github.com/18284b28a2a7a4090942

However, I'm getting text search matches based on fields like "title_en".

Any ideas why? Do I need to set something else to make a field not
searchable?

I tried this out, and it seems that your matches are coming from the
_all field, rather than the field "title_en" itself.

You can change this behaviour by adding "include_in_all: false" to those
fields.

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"document" : {
"properties" : {
"title_cn" : {
"index" : "no",
"type" : "string",
"include_in_all" : false
},
"source_ids" : {
"type" : "string"
}
}
}
}
}
'

This smells like a bug to me. If you specify "index: no" for a field,
it should also disable "include_in_all" by default.

I've opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1087

clint


(Shay Banon) #6

If you specify index to no, then it won't be searchable when you explicitly search against that field. The _all field works differently, its basically an aggregation of all the other field, and then broken down into terms, and you can control which parts / fields of the json are included in all or not.

On Saturday, July 2, 2011 at 4:31 PM, Chris Berkhout wrote:

I agree that changing the default probably makes sense.

However, I was originally going on this:
"index: Set to analyzed for the field to be indexed and searchable after being broken down into token using an analyzer. not_analyzed means that its still searchable, but does not go through any analysis process or broken down into tokens. no means that it won’t be searchable at all. Defaults to analyzed."
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

So I'm a little surprised if the include_in_all default is the root issue.

If include_in_all is changed, won't it still be possible to search it by specifying the field? I think it's worthwhile to be able to make a field properly non-searchable.

Cheers,
Chris

On Sat, Jul 2, 2011 at 9:05 PM, Shay Banon <shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> wrote:

Yea, thats the behavior now, when you set "index" to "no", then it will
still default to be included in all, unless you explicitly set the
indclude_in_all to false. There are cases where you want that, the question
is what the default should be.
I agree that this is a more sensible default compared to what we have today.
i.e., when setting "index" to "no", don't include it in _all by default
(unless explicitly set to be included). Its a backward change, I am up for
it, but lets hear from other people what they think...

On Saturday, July 2, 2011 at 2:29 PM, Chris Berkhout wrote:

Thanks Clint!

Sounds like the issue will get dealt with in the end, and there's an
easy workaround for now, so I'm very happy!

Cheers,
Chris

On Sat, Jul 2, 2011 at 5:38 PM, Clinton Gormley <clinton@iannounce.co.uk (mailto:clinton@iannounce.co.uk)>
wrote:

Hi Chris

I've got a document type, currently with all fields, except some IDs,
set to "index":"no" (to make them not searchable).
My mappings look like this: https://gist.github.com/18284b28a2a7a4090942

However, I'm getting text search matches based on fields like "title_en".

Any ideas why? Do I need to set something else to make a field not
searchable?

I tried this out, and it seems that your matches are coming from the
_all field, rather than the field "title_en" itself.

You can change this behaviour by adding "include_in_all: false" to those
fields.

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"document" : {
"properties" : {
"title_cn" : {
"index" : "no",
"type" : "string",
"include_in_all" : false
},
"source_ids" : {
"type" : "string"
}
}
}
}
}
'

This smells like a bug to me. If you specify "index: no" for a field,
it should also disable "include_in_all" by default.

I've opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1087

clint


(Chris Berkhout) #7

Ah, I see...

On Sat, Jul 2, 2011 at 9:33 PM, Shay Banon shay.banon@elasticsearch.comwrote:

If you specify index to no, then it won't be searchable when you
explicitly search against that field. The _all field works differently, its
basically an aggregation of all the other field, and then broken down into
terms, and you can control which parts / fields of the json are included in
all or not.

On Saturday, July 2, 2011 at 4:31 PM, Chris Berkhout wrote:

I agree that changing the default probably makes sense.

However, I was originally going on this:
"index: Set to analyzed for the field to be indexed and searchable after
being broken down into token using an analyzer. not_analyzed means that its
still searchable, but does not go through any analysis process or broken
down into tokens. no means that it won’t be searchable at all. Defaults
to analyzed."
http://www.elasticsearch.org/guide/reference/mapping/core-types.html

So I'm a little surprised if the include_in_all default is the root issue.

If include_in_all is changed, won't it still be possible to search it by
specifying the field? I think it's worthwhile to be able to make a field
properly non-searchable.

Cheers,
Chris

On Sat, Jul 2, 2011 at 9:05 PM, Shay Banon shay.banon@elasticsearch.com
wrote:

Yea, thats the behavior now, when you set "index" to "no", then it will
still default to be included in all, unless you explicitly set the
indclude_in_all to false. There are cases where you want that, the
question
is what the default should be.
I agree that this is a more sensible default compared to what we have
today.
i.e., when setting "index" to "no", don't include it in _all by default
(unless explicitly set to be included). Its a backward change, I am up
for
it, but lets hear from other people what they think...

On Saturday, July 2, 2011 at 2:29 PM, Chris Berkhout wrote:

Thanks Clint!

Sounds like the issue will get dealt with in the end, and there's an
easy workaround for now, so I'm very happy!

Cheers,
Chris

On Sat, Jul 2, 2011 at 5:38 PM, Clinton Gormley <clinton@iannounce.co.uk

wrote:

Hi Chris

I've got a document type, currently with all fields, except some IDs,
set to "index":"no" (to make them not searchable).
My mappings look like this: https://gist.github.com/18284b28a2a7a4090942

However, I'm getting text search matches based on fields like "title_en".

Any ideas why? Do I need to set something else to make a field not
searchable?

I tried this out, and it seems that your matches are coming from the
_all field, rather than the field "title_en" itself.

You can change this behaviour by adding "include_in_all: false" to those
fields.

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"document" : {
"properties" : {
"title_cn" : {
"index" : "no",
"type" : "string",
"include_in_all" : false
},
"source_ids" : {
"type" : "string"
}
}
}
}
}
'

This smells like a bug to me. If you specify "index: no" for a field,
it should also disable "include_in_all" by default.

I've opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1087

clint


(system) #8