Query string query doesn't work when field has customer analyzer


#1

My intention is to apply a custom analyzer to this example path, then to search for this document based on keyword "bar" using Elasticsearch "query string query":

The mapping file "test_map.txt":

{
    {
    "settings" : {
	"analysis" : {
            "char_filter" : {
                "replace_slash_with_space" : {
                    "type"        : "pattern_replace",
                    "pattern"     : "\\\\",
                    "replacement" : " "
                }
            },
            "filter" : {
                "custom_path_filter" : {
                    "type"              : "pattern_capture",
                    "preserve_original" : true,
                    "patterns"          : [
                        "(([a-z|A-Z]+)_([a-z|A-Z]+))"
                    ]
                }
            },
	    "analyzer" : {
                "custom_path_analyzer" : {
                    "type"        : "custom",
                    "char_filter" : ["replace_slash_with_space"],
                    "tokenizer"   : "whitespace",
                    "filter"      : ["custom_path_filter"]
                }
	    }
	}
    },
    "mappings" : {
	"_default_" : {
	    "properties" : {
		"Path" : {"type" : "string", "analyzer" : "custom_path_analyzer"}
	    }
	}
    }
}

The data file "test.txt":

{"index": {"_type": "pathtype", "_id": "P1", "_index": "path_index"}}
{"Path": "C:\\example\\foo_bar\\test"}

Indexing data:

curl -XPOST http://localhost:9200/path_index -d @.\test_map.txt
curl -s -XPOST http://localhost:9200/_bulk --data-binary @test.txt

Tokens generated by using the custom_path_analyzer (note that foo_bar, foo and bar are tokens):

curl.exe -XGET http://localhost:9200/path_index/_analyze?analyzer=custom_path_analyzer -d 'C:\example\foo_bar\test'

{
    "tokens": [
        {
            "end_offset": 3,
            "position": 1,
            "start_offset": 0,
            "token": "'c:",
            "type": "word"
        },
        {
            "end_offset": 11,
            "position": 2,
            "start_offset": 4,
            "token": "example",
            "type": "word"
        },
        {
            "end_offset": 19,
            "position": 3,
            "start_offset": 12,
            "token": "foo_bar",
            "type": "word"
        },
        {
            "end_offset": 19,
            "position": 3,
            "start_offset": 12,
            "token": "foo",
            "type": "word"
        },
        {
            "end_offset": 19,
            "position": 3,
            "start_offset": 12,
            "token": "bar",
            "type": "word"
        },
        {
            "end_offset": 25,
            "position": 4,
            "start_offset": 20,
            "token": "test'",
            "type": "word"
        }
    ]
}

Failed Queries:

curl.exe -XGET http://localhost:9200/path_index/_search?q=bar
curl.exe -XGET http://localhost:9200/path_index/_search?q=foo

Successful Queries:

curl.exe -XGET http://localhost:9200/path_index/_search?q=Path:bar
curl.exe -XGET http://localhost:9200/path_index/_search?q=Path:foo
curl.exe -XGET http://localhost:9200/path_index/_search?q=foo_bar

Why didn't [code]curl.exe -XGET http://localhost:9200/path_index/_search?q=bar[/code] returned any hits?

(Joshua Rich) #2

I believe this is because by default the query_string query (which is invoked when you do a URI search) searches on the _all field by default, which won't have your custom analyzer applied to it. Where you explicitly search against the Path field (i.e. q=Path:bar) work correctly, for all other searches all bets are off.


#3

Hi Joshua_Rich,

Thanks for your quick reply. Appreciated.

So which search method (e.g. simple query, multi fields etc) should I use to achieve the following?:

  1. User provides one or more keywords
  2. Then, search all available fields and return result when a match is found. The keywords must be analysed by the analyser defined for the respective field in the mappings file.

Cheers
Alex


(Joshua Rich) #4

You can probably either:

  • Explicitly specify the fields to search on.
  • Set the analyzer on the _all field to the same as the Path field.
  • Use copy_to to create a meta-field like _all that contains the tokens from all fields that use the same analyzer, then set this as the default field for searches like query_string, etc.

#5

Hi Joshua,

I have thought about the 1st and 2nd options, while the 3rd seems interesting. I think the best would be the first option with multi-fields.

Thanks again.
Alex


(system) #6