Stop words returning results?

Hi,

I think I need to understand how the "_all" field works when it comes to
analysis. I want to query against all field that are indexed, not just
specific ones, this is for a generic free text search against content.

I have created a stopword filter with just "the" and "and" in it.

I can query how the fields are analyzed and find that "the" is indeed
removed:

curl http://localhost:9200/content/_analyze?field=content -d "the"
{"tokens":[]}

But if I check how the _all field is analyzed:

curl http://localhost:9200/content/_analyze?field=_all -d "the"
{"tokens":[{"token":"the","start_offset":0,"end_offset":3,"type":"","position":1}]}

So when I query _all I get results that I don't want.

Here is the start of my mappings, as you can see I did not set anything on
_all.

{"content":{"mappings":{"business_rates_calculator":{
"dynamic":"false",
"properties":{
"businessTypes":{"type":"string","index":"not_analyzed"},
"content":{"type":"string","analyzer":"custom_english"},
...

How do I set up _all so that it is composed of all the fields that are
mapped, each analyzed by the analyzer configured on the field. I assumed
this is how _all would work by default, but clearly not.

Thanks for your assistance.

Rupert

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

_all has its own analyzer, if you do not set it, it will be the standard
analyzer by default.

Jörg

On Thu, Apr 2, 2015 at 11:04 AM, Rupert Smith rupertlssmith@googlemail.com
wrote:

Hi,

I think I need to understand how the "_all" field works when it comes to
analysis. I want to query against all field that are indexed, not just
specific ones, this is for a generic free text search against content.

I have created a stopword filter with just "the" and "and" in it.

I can query how the fields are analyzed and find that "the" is indeed
removed:

curl http://localhost:9200/content/_analyze?field=content -d "the"
{"tokens":[]}

But if I check how the _all field is analyzed:

curl http://localhost:9200/content/_analyze?field=_all -d "the"

{"tokens":[{"token":"the","start_offset":0,"end_offset":3,"type":"","position":1}]}

So when I query _all I get results that I don't want.

Here is the start of my mappings, as you can see I did not set anything on
_all.

{"content":{"mappings":{"business_rates_calculator":{
"dynamic":"false",
"properties":{
"businessTypes":{"type":"string","index":"not_analyzed"},
"content":{"type":"string","analyzer":"custom_english"},
...

How do I set up _all so that it is composed of all the fields that are
mapped, each analyzed by the analyzer configured on the field. I assumed
this is how _all would work by default, but clearly not.

Thanks for your assistance.

Rupert

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFpiVq75c%2B11aUir8YovZnY%3DERNXZDAVTtHUpsfe4NoDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ok thanks.

Some of the fields are not_analyzed, and potentially some new fields could
have different analyzers. I guess what you are saying is that if I want to
use _all, I have to set it up with just one analyzer for all fields
included in it? I can't mix say english and not_analyzed?

I guess this is not such a problem really, since I don't yet have a use
case for needing to query against multiple analyzers.

Rupert

On Thursday, April 2, 2015 at 10:09:21 AM UTC+1, Jörg Prante wrote:

_all has its own analyzer, if you do not set it, it will be the standard
analyzer by default.

Jörg

On Thu, Apr 2, 2015 at 11:04 AM, Rupert Smith <rupert...@googlemail.com
<javascript:>> wrote:

Hi,

I think I need to understand how the "_all" field works when it comes to
analysis. I want to query against all field that are indexed, not just
specific ones, this is for a generic free text search against content.

I have created a stopword filter with just "the" and "and" in it.

I can query how the fields are analyzed and find that "the" is indeed
removed:

curl http://localhost:9200/content/_analyze?field=content -d "the"
{"tokens":[]}

But if I check how the _all field is analyzed:

curl http://localhost:9200/content/_analyze?field=_all -d "the"

{"tokens":[{"token":"the","start_offset":0,"end_offset":3,"type":"","position":1}]}

So when I query _all I get results that I don't want.

Here is the start of my mappings, as you can see I did not set anything
on _all.

{"content":{"mappings":{"business_rates_calculator":{
"dynamic":"false",
"properties":{
"businessTypes":{"type":"string","index":"not_analyzed"},
"content":{"type":"string","analyzer":"custom_english"},
...

How do I set up _all so that it is composed of all the fields that are
mapped, each analyzed by the analyzer configured on the field. I assumed
this is how _all would work by default, but clearly not.

Thanks for your assistance.

Rupert

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/760a2197-6f97-4ff9-956d-6dcf91b2a115%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It is technically possible to combine analyzers for a single field, see the
combo analyzer https://github.com/yakaz/elasticsearch-analysis-combo/

Jörg

On Thu, Apr 2, 2015 at 11:12 AM, Rupert Smith rupertlssmith@googlemail.com
wrote:

Ok thanks.

Some of the fields are not_analyzed, and potentially some new fields could
have different analyzers. I guess what you are saying is that if I want to
use _all, I have to set it up with just one analyzer for all fields
included in it? I can't mix say english and not_analyzed?

I guess this is not such a problem really, since I don't yet have a use
case for needing to query against multiple analyzers.

Rupert

On Thursday, April 2, 2015 at 10:09:21 AM UTC+1, Jörg Prante wrote:

_all has its own analyzer, if you do not set it, it will be the standard
analyzer by default.

Jörg

On Thu, Apr 2, 2015 at 11:04 AM, Rupert Smith rupert...@googlemail.com
wrote:

Hi,

I think I need to understand how the "_all" field works when it comes to
analysis. I want to query against all field that are indexed, not just
specific ones, this is for a generic free text search against content.

I have created a stopword filter with just "the" and "and" in it.

I can query how the fields are analyzed and find that "the" is indeed
removed:

curl http://localhost:9200/content/_analyze?field=content -d "the"
{"tokens":[]}

But if I check how the _all field is analyzed:

curl http://localhost:9200/content/_analyze?field=_all -d "the"
{"tokens":[{"token":"the","start_offset":0,"end_offset":
3,"type":"","position":1}]}

So when I query _all I get results that I don't want.

Here is the start of my mappings, as you can see I did not set anything
on _all.

{"content":{"mappings":{"business_rates_calculator":{
"dynamic":"false",
"properties":{
"businessTypes":{"type":"string","index":"not_analyzed"},
"content":{"type":"string","analyzer":"custom_english"},
...

How do I set up _all so that it is composed of all the fields that are
mapped, each analyzed by the analyzer configured on the field. I assumed
this is how _all would work by default, but clearly not.

Thanks for your assistance.

Rupert

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/760a2197-6f97-4ff9-956d-6dcf91b2a115%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/760a2197-6f97-4ff9-956d-6dcf91b2a115%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEp9cD1qi0xs1L%2BvEda8MFL2_weR6f-y%2BoORVoRKQ9X7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Sweet, thanks for your help.

On Thursday, April 2, 2015 at 3:16:29 PM UTC+1, Jörg Prante wrote:

It is technically possible to combine analyzers for a single field, see
the combo analyzer https://github.com/yakaz/elasticsearch-analysis-combo/

Jörg

On Thu, Apr 2, 2015 at 11:12 AM, Rupert Smith <rupert...@googlemail.com
<javascript:>> wrote:

Ok thanks.

Some of the fields are not_analyzed, and potentially some new fields
could have different analyzers. I guess what you are saying is that if I
want to use _all, I have to set it up with just one analyzer for all fields
included in it? I can't mix say english and not_analyzed?

I guess this is not such a problem really, since I don't yet have a use
case for needing to query against multiple analyzers.

Rupert

On Thursday, April 2, 2015 at 10:09:21 AM UTC+1, Jörg Prante wrote:

_all has its own analyzer, if you do not set it, it will be the standard
analyzer by default.

Jörg

On Thu, Apr 2, 2015 at 11:04 AM, Rupert Smith rupert...@googlemail.com
wrote:

Hi,

I think I need to understand how the "_all" field works when it comes
to analysis. I want to query against all field that are indexed, not just
specific ones, this is for a generic free text search against content.

I have created a stopword filter with just "the" and "and" in it.

I can query how the fields are analyzed and find that "the" is indeed
removed:

curl http://localhost:9200/content/_analyze?field=content -d "the"
{"tokens":[]}

But if I check how the _all field is analyzed:

curl http://localhost:9200/content/_analyze?field=_all -d "the"
{"tokens":[{"token":"the","start_offset":0,"end_offset":
3,"type":"","position":1}]}

So when I query _all I get results that I don't want.

Here is the start of my mappings, as you can see I did not set anything
on _all.

{"content":{"mappings":{"business_rates_calculator":{
"dynamic":"false",
"properties":{
"businessTypes":{"type":"string","index":"not_analyzed"},
"content":{"type":"string","analyzer":"custom_english"},
...

How do I set up _all so that it is composed of all the fields that are
mapped, each analyzed by the analyzer configured on the field. I assumed
this is how _all would work by default, but clearly not.

Thanks for your assistance.

Rupert

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0cee71aa-891b-4f11-9e3d-8510ac34a5fe%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/760a2197-6f97-4ff9-956d-6dcf91b2a115%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/760a2197-6f97-4ff9-956d-6dcf91b2a115%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ae8b76d-c290-43a3-a1f3-67e750daeb3d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.