Are certain fields excluded from being part of "_all" grouping?

pulkitsinghal · March 25, 2014, 3:53pm

I ran the following two queries on v0.90.5:

POST /my_index/product/_search

{"query":{"bool":{"must":[{"query_string":{"default_field":"_all
","query":"cinna"}}]}}}

POST /my_index/product/_search
{"query":{"bool":{"must":[{"query_string":{"default_field":"name
","query":"cinna"}}]}}}

The query with the "_all" field did not return any results but the one with
"name" field returned 365 results.

The "name" field is mapped like so:

        "name" : {
            "analyzer": "word_break",
            "type": "string"
        },

Would/should this prevent it from falling under the "_all" grouping in
searches?

{
"index":{
"analysis":{
"analyzer":{
"word_break":{
"type": "custom",
"tokenizer": "standard",
"filter":["word_delimiter","lowercase","custom_gram"]
}
},
"filter":{
"custom_gram":{
"type":"ngram",
"min_gram":2,
"max_gram":7
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed30456a-f86b-49bf-af8f-979c58e74fa4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

pulkitsinghal · March 25, 2014, 5:43pm

After reading up on '_all' a bit more, I now realize that its not
implemented to collect the resulting tokens from fields but their "_source"
values instead! So ofcourse it won't work .. boo hoo

On Tuesday, March 25, 2014 10:53:10 AM UTC-5, pulkitsinghal wrote:

I ran the following two queries on v0.90.5:

POST /my_index/product/_search

{"query":{"bool":{"must":[{"query_string":{"default_field":"_all
","query":"cinna"}}]}}}

POST /my_index/product/_search
{"query":{"bool":{"must":[{"query_string":{"default_field":"name
","query":"cinna"}}]}}}

The query with the "_all" field did not return any results but the one
with "name" field returned 365 results.

The "name" field is mapped like so:
        "name" : {
            "analyzer": "word_break",
            "type": "string"
        },
Would/should this prevent it from falling under the "_all" grouping in
searches?

{
"index":{
"analysis":{
"analyzer":{
"word_break":{
"type": "custom",
"tokenizer": "standard",
"filter":["word_delimiter","lowercase","custom_gram"]
}
},
"filter":{
"custom_gram":{
"type":"ngram",
"min_gram":2,
"max_gram":7
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2e313b3-8a90-4091-b22e-75b74c87e071%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brian_yoder · March 25, 2014, 8:35pm

What I did with the high-performance query engine I built in 2001-2010 was
to OR the queries for individual fields, creating a query-time version of
the _all field. It was blindingly fast.

What I now do with Elasticsearch is to disable the _all field because of
the issues you've found (and also, it greatly increases build performance
to omit it) and then issue an OR-like query at run-time to include the
fields I want. The added benefit is that each field is queried using its
own index analyzer which is exactly what we both want.

For example, my Finnish analyzer includes the following char filter:

"finnish_char_mapper" : {
"type" : "mapping",
"mappings" : [ "Å=>O", "å=>o", "W=>V", "w=>v" ]
}

Here is a subset of the field mappings for a test index that I created to
explore the wide variety of Elasticsearch functions. The cn (common name)
field is a multi-type (old ES naming convention); the fn (Finnish name) and
an (Arabic name) are also present to explore multi-type with regular fields
to ensure their analysis works either way (it did!):

"cn" : {
"type" : "string",
"analyzer" : "english_stemming_analyzer",
"fields" : {
"finnish" : {
"type" : "string",
"analyzer" : "finnish_stemming_analyzer"
},
"arabic" : {
"type" : "string",
"analyzer" : "arabic_stemming_Arabic_analyzer"
},
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},

Here's the query. It's programmatically generated. Performance is good even
when similarly constructed queries are issued to indices that contain 97
million records across 5 shards:

{
"bool" : {
"must" : {
"bool" : {
"should" : [ {
"match" : {
"cn" : {
"query" : "osa",
"type" : "boolean"
}
}
}, {
"match" : {
"cn.finnish" : {
"query" : "osa",
"type" : "boolean"
}
}
}, {
"match" : {
"cn.arabic" : {
"query" : "osa",
"type" : "boolean"
}
}
} ],
"minimum_should_match" : "1"
}
}
}
}

And here is the one source record that matches (as expected); it's a tiny
type based on number of records, but a rich type based on the number of
different analyzers that are present within the mappings and data types
within those records:

{ "uid" : 6 , "cn" : [ "Åsa Virtanen" , "كايتلين" ] , "fn" : [ "Åsa
Virtanen" , "كايتلين" ] , "an" : [ "Åsa Virtanen" , "كايتلين" ] , "sex" :
"F" , "married" : true , "date" : "1989-03-26T15:21:55Z" , "location" : [
-116.910522 , 32.804101 ] , "telno" : "6111234567" , "text" : [ "Born in
Helsinki" , "Pure Beauty" , "Lives in Granite Hills, CA" ] }

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e9346812-d4a1-4b51-807f-d6d903015786%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Do I have to explicitly exclude the _all field in queries? Elasticsearch	2	399	July 6, 2017
Stop words returning results? Elasticsearch	5	327	July 6, 2017
Excluding fields with partial fields not working? Elasticsearch	2	425	July 6, 2017
_all not pulling subfields Elasticsearch	2	404	July 6, 2017
Elasticsearch 6 and the disappearing _all field Elasticsearch	7	12982	June 14, 2017

Are certain fields excluded from being part of "_all" grouping?

Related topics