Wildcard analyze does not work for "the"


(Maciej Wiercinski) #1

Hi,

I'm struggling to get a wildcard query running searching for a string "The
Times". As far as I understand the tokenizer should remove "The" as a stop
word while indexing the field, however it does not seem to get applied to
the wildcard, regardless of "analyze_wildcard" setting. I've tried changing
the mapping on "name" field to not_analyzed, however it didn't help.

Should I report it as a bug, or am I missing something?

Full example:

$ curl -XDELETE 127.0.0.1:9200/test_index?pretty
{
"ok" : true,
"acknowledged" : true
}

$ curl -XPUT 127.0.0.1:9200/test_index/test_type/1?pretty -d '{ "name": "The
Times" }';
{
"ok" : true,
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_version" : 1
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the times*","default_operator":"AND",
"analyze_wildcard": "true" }}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"name": "The Times"
}
} ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the* times*","default_operator":"AND",
"analyze_wildcard": "true" }}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the times*","default_operator":"AND",
"analyze_wildcard": "false" }}}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"name": "The Times"
}
} ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the* times*","default_operator":"AND",
"analyze_wildcard": "false" }}}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

Kind regards,
Maciej Wiercinski


(Clinton Gormley) #2

Hi Maciej

'the' is a stopword, and so is being removed. You can disable stopwords
with a custom analyzer

clint

I'm struggling to get a wildcard query running searching for a string
"The Times". As far as I understand the tokenizer should remove "The"
as a stop word while indexing the field, however it does not seem to
get applied to the wildcard, regardless of "analyze_wildcard" setting.
I've tried changing the mapping on "name" field to not_analyzed,
however it didn't help.

Should I report it as a bug, or am I missing something?

Full example:

$ curl -XDELETE 127.0.0.1:9200/test_index?pretty
{
"ok" : true,
"acknowledged" : true
}

$ curl -XPUT 127.0.0.1:9200/test_index/test_type/1?pretty -d
'{ "name": "The Times" }';
{
"ok" : true,
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_version" : 1
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the
times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"name": "The Times"
}
} ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the*
times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the
times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"name": "The Times"
}
} ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the*
times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

Kind regards,
Maciej Wiercinski


(Maciej Wiercinski) #3

Hi Clinton

I do understand that "the" is a stopword, however I still reckon it's
a bug. If "the" is being removed from the search query in form "the
AND times*" and the search yields positive results, then "the* AND
times*" should also be the case - wildcard_analyze should remove the*
part and make the search equivalent to "times*".

Any thoughts?

Kinds regards,
Maciej

On Aug 19, 9:56 am, Clinton Gormley cl...@traveljury.com wrote:

Hi Maciej

'the' is a stopword, and so is being removed. You can disable stopwords
with a custom analyzer

clint

I'm struggling to get a wildcard query running searching for a string
"The Times". As far as I understand the tokenizer should remove "The"
as a stop word while indexing the field, however it does not seem to
get applied to the wildcard, regardless of "analyze_wildcard" setting.
I've tried changing the mapping on "name" field to not_analyzed,
however it didn't help.

Should I report it as a bug, or am I missing something?

Full example:

$ curl -XDELETE 127.0.0.1:9200/test_index?pretty
{
"ok" : true,
"acknowledged" : true
}

$ curl -XPUT 127.0.0.1:9200/test_index/test_type/1?pretty -d
'{ "name": "The Times" }';
{
"ok" : true,
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_version" : 1
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the
times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"name": "The Times"
}
} ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the*
times*","default_operator":"AND", "analyze_wildcard": "true" }}}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the
times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test_index",
"_type" : "test_type",
"_id" : "1",
"_score" : 1.0, "_source" : {
"name": "The Times"
}
} ]
}
}

$ curl -XGET 127.0.0.1:9200/test_index/test_type/_search?pretty -d
'{"query":{"query_string":{"query":"the*
times*","default_operator":"AND", "analyze_wildcard": "false" }}}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

Kind regards,
Maciej Wiercinski


(system) #4