I guess it is: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
analyze_wildcard
By default, wildcards terms in a query string are not analyzed. By setting this value to true, a best effort will be made to analyze those as well.
So when searching for "khô*", you are trying to compare "khô" with the inverted index term "kho". It does not match.
BTW I think you should consider using MatchQuery instead of QueryStringQuery: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
Le 17 décembre 2013 at 09:24:46, kidkid (zkidkid@gmail.com) a écrit:
Hi David,
I have figure out the problem:
Let said we have already setup as I said above.
Now I try with:
String query = "khô*"
QueryStringQueryBuilder queryString = QueryBuilders.queryString(query).defaultField("myfield");
I would expect to get : "không có gì" but actually It will return nothing
I have to set analyzeWildcard(true) and it do fine.
The question here, incase I don't set analyzeWildcard(true).
If I search kho* it would return "không có gì" document
But if I search khô* it wouldn't return.
Is it reasonable ?
On Sunday, December 15, 2013 8:54:53 PM UTC-8, David Pilato wrote:
Could you gist your java code?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 16 déc. 2013 à 04:15, kidkid zki...@gmail.com a écrit :
Hi,
I am using QueryString in JavaAPI and find that it work really strange with query string in Rest.
Here is step to reproduce it
First, add asciifolding to filter:
analyzer:
default:
tokenizer: standard
filter: [asciifolding,lowercase]
Create your index, and indexing your data with unicode word, ex: không có gì
Search in head plugin with: không, -> you get your document "không có gì"
Search in java api with: không -> you get nothing
Search in java api with: khong -> you get your document
First I think it's because my index is not use asciifolding & lowercase filter, so I test it like that:
http://127.0.0.1:9200/myindex/_analyze?text=không%20có%20gì
Result:
{"tokens":[{"token":"khong","start_offset":0,"end_offset":5,"type":"","position":1},{"token":"co","start_offset":6,"end_offset":8,"type":"","position":2},{"token":"gi","start_offset":9,"end_offset":11,"type":"","position":3}]}
So there shouldn't problem with filter.
Currently I fix it by do Asciifolding & lowercase by my self using Lucence AsciiFoldingFilter. But I really want to know what's happening.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f2db090e-50e2-490e-a6a4-d8376c859ac0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52b00ba5.580bd78f.6956%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.