Are certain fields excluded from being part of "_all" grouping?


(pulkitsinghal) #1

I ran the following two queries on v0.90.5:

POST /my_index/product/_search

{"query":{"bool":{"must":[{"query_string":{"default_field":"_all
","query":"cinna"}}]}}}

POST /my_index/product/_search
{"query":{"bool":{"must":[{"query_string":{"default_field":"name
","query":"cinna"}}]}}}

The query with the "_all" field did not return any results but the one with
"name" field returned 365 results.

The "name" field is mapped like so:

        "name" : {
            "analyzer": "word_break",
            "type": "string"
        },

Would/should this prevent it from falling under the "_all" grouping in
searches?

{
"index":{
"analysis":{
"analyzer":{
"word_break":{
"type": "custom",
"tokenizer": "standard",
"filter":["word_delimiter","lowercase","custom_gram"]
}
},
"filter":{
"custom_gram":{
"type":"ngram",
"min_gram":2,
"max_gram":7
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed30456a-f86b-49bf-af8f-979c58e74fa4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(pulkitsinghal) #2

After reading up on '_all' a bit more, I now realize that its not
implemented to collect the resulting tokens from fields but their "_source"
values instead! So ofcourse it won't work .. boo hoo :stuck_out_tongue:

On Tuesday, March 25, 2014 10:53:10 AM UTC-5, pulkitsinghal wrote:

I ran the following two queries on v0.90.5:

POST /my_index/product/_search

{"query":{"bool":{"must":[{"query_string":{"default_field":"_all
","query":"cinna"}}]}}}

POST /my_index/product/_search
{"query":{"bool":{"must":[{"query_string":{"default_field":"name
","query":"cinna"}}]}}}

The query with the "_all" field did not return any results but the one
with "name" field returned 365 results.

The "name" field is mapped like so:

        "name" : {
            "analyzer": "word_break",
            "type": "string"
        },

Would/should this prevent it from falling under the "_all" grouping in
searches?

{
"index":{
"analysis":{
"analyzer":{
"word_break":{
"type": "custom",
"tokenizer": "standard",
"filter":["word_delimiter","lowercase","custom_gram"]
}
},
"filter":{
"custom_gram":{
"type":"ngram",
"min_gram":2,
"max_gram":7
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2e313b3-8a90-4091-b22e-75b74c87e071%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #3

What I did with the high-performance query engine I built in 2001-2010 was
to OR the queries for individual fields, creating a query-time version of
the _all field. It was blindingly fast.

What I now do with Elasticsearch is to disable the _all field because of
the issues you've found (and also, it greatly increases build performance
to omit it) and then issue an OR-like query at run-time to include the
fields I want. The added benefit is that each field is queried using its
own index analyzer which is exactly what we both want.

For example, my Finnish analyzer includes the following char filter:

"finnish_char_mapper" : {
"type" : "mapping",
"mappings" : [ "Å=>O", "å=>o", "W=>V", "w=>v" ]
}

Here is a subset of the field mappings for a test index that I created to
explore the wide variety of Elasticsearch functions. The cn (common name)
field is a multi-type (old ES naming convention); the fn (Finnish name) and
an (Arabic name) are also present to explore multi-type with regular fields
to ensure their analysis works either way (it did!):

"cn" : {
"type" : "string",
"analyzer" : "english_stemming_analyzer",
"fields" : {
"finnish" : {
"type" : "string",
"analyzer" : "finnish_stemming_analyzer"
},
"arabic" : {
"type" : "string",
"analyzer" : "arabic_stemming_Arabic_analyzer"
},
"raw" : {
"type" : "string",
"index" : "not_analyzed"
}
}
},

Here's the query. It's programmatically generated. Performance is good even
when similarly constructed queries are issued to indices that contain 97
million records across 5 shards:

{
"bool" : {
"must" : {
"bool" : {
"should" : [ {
"match" : {
"cn" : {
"query" : "osa",
"type" : "boolean"
}
}
}, {
"match" : {
"cn.finnish" : {
"query" : "osa",
"type" : "boolean"
}
}
}, {
"match" : {
"cn.arabic" : {
"query" : "osa",
"type" : "boolean"
}
}
} ],
"minimum_should_match" : "1"
}
}
}
}

And here is the one source record that matches (as expected); it's a tiny
type based on number of records, but a rich type based on the number of
different analyzers that are present within the mappings and data types
within those records:

{ "uid" : 6 , "cn" : [ "Åsa Virtanen" , "كايتلين" ] , "fn" : [ "Åsa
Virtanen" , "كايتلين" ] , "an" : [ "Åsa Virtanen" , "كايتلين" ] , "sex" :
"F" , "married" : true , "date" : "1989-03-26T15:21:55Z" , "location" : [
-116.910522 , 32.804101 ] , "telno" : "6111234567" , "text" : [ "Born in
Helsinki" , "Pure Beauty" , "Lives in Granite Hills, CA" ] }

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e9346812-d4a1-4b51-807f-d6d903015786%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4