Hello, I want to be able to able to search Japanese inputs as well as English. I don't want to use the plugin, because I just want to do partial search within a Japanese input. Therefore please do not suggest kuromoji
.
ElasticSearch version: 5.6.1
The problem is that, I want to use simple
analyzer for my index and I think I achieved that with elasticsearch-dsl
.
First problem (and also a question):
When I call blue.local:9200/contracts/_settings/
, I cannot see simple
as being the analyzer in the index settings:
{
"contracts": {
"settings": {
"index": {
"creation_date": "1507127956748",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "ehfhOJ2OStqS7fd4wLLn1g",
"version": {
"created": "5060199"
},
"provided_name": "contracts"
}
}
}
}
I believe this might be normal for generic analyzers. Right?
Then I analyzed the simple
analyzer by calling blue.local:9200/_analyze?analyzer=simple&text=地上権
and the result was:
{
"tokens": [
{
"token": "地上権",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
}
]
}
When I was using standard
analyzer, every Japanese letter was a token. Now it's not, and I think this is what I want.
Then, I validated my query by calling:
POST blue.local:9200/contracts/_validate/query?explain
{
"query": {
"query_string" : {
"query" : "name:地上権",
"analyzer": "simple"
}
}
}
And the response was:
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "contracts",
"valid": true,
"explanation": "name:地上権"
}
]
}
I guess, the explanation here shows that I am on the right path.
BUT, when I do the query:
POST blue.local:9200/contracts/_search/
{
"query": {
"query_string" : {
"query" : "name:地上権",
"analyzer": "simple"
}
}
}
I get zero hits:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I am sure that the data exists.
When I remove the analyzer in the query:
POST blue.local:9200/contracts/_search/?explain
{
"query": {
"query_string" : {
"query" : "name:地上権"
}
}
}
I get this:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 36,
"max_score": 12.01759,
"hits": [
{
"_shard": "[contracts][0]",
"_node": "IPjnafPMRyGqMLrABkvudQ",
"_index": "contracts",
"_type": "contract_document",
"_id": "6192",
"_score": 12.01759,
"_source": {
"client": "My Client",
"id": 6192,
"name": "地上権"
},
"_explanation": {
"value": 12.01759,
"description": "sum of:",
"details": [
{
"value": 3.6425304,
"description": "weight(name:地 in 114) [PerFieldSimilarity], result of:",
"details": [
...
]
},
...
]
}
}
]
}
}
I see that the weight is being calculated on each letter like they are indexed with standard analyzer.
What should be my next step? I spent over 6 hours to find the issue, but I failed.