Cannot do the fuzzy match with Java RestHighLevelClient

#1

We are trying to use elasticsearch to do some inverted index (e.g. name -> id index). We are using Amazon ElasticSearch. And from the client side, I used Java Rest High Level API and build SearchRequest as follows:

SearchRequest buildSearchRequest(ResolutionRequest request) {
    SearchSourceBuilder searchSourceBuilder =
        new SearchSourceBuilder()
            .query(
                QueryBuilders.queryStringQuery(request.getInput())
                    .fuzziness(Fuzziness.TWO)
                    .fuzzyPrefixLength(3)
                    .fuzzyMaxExpansions(10));
    request.getSizeOptional().map(val -> searchSourceBuilder.size(val.intValue()));
    request.getOffsetOptional().map(val -> searchSourceBuilder.from(val.intValue()));
    return new SearchRequest()
        .source(searchSourceBuilder)
        .searchType(SearchType.DFS_QUERY_THEN_FETCH);

But it turns out it cannot do the fuzzy match even I set the Fuziness to TWO (which is the highest level that can be set). For example, if the index has "cabin", when user input "cabins", there's no results returned.

Am I forming fuzzy query in a wrong way or did I miss anything?

To follow up with David's request, here is the index settings and things I plan to do:

DELETE index
PUT index/_doc/type_name
{
   "id": "id1"
   "name": "cabin"
   "indexed_at": "2019-03-01"
}
GET index/_search
{
  "query": {
      "query_string": {
            "query": "cabins"
            "fuzziness": "TWO"
            "fuzzy_prefix_length": 3
            "fuzzy_max_expansions": 10
     }
  }
}

Thank you!

(David Pilato) #2

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.

I updated your post but please do it next time. Thanks!

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible and don't do it in Java but just with Kibana dev console (as shown in the example).

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

#3

Thanks for the modification, David! That's help the readability a lot!

Regarding the recreation script, I think it might related to Java client question, therefore I pasted the Java code. (not sure how this java code is translated to recreation script :()

But I did refine the question and put more details. Thanks so much for the tips

(David Pilato) #4

I don't think so. That's what I'd like to figure out.

So please provide a full example, index settings, mappings, documents, and search that we can use to reproduce and fix your problem or explain why the behavior is what you see.

#5

Gotcha! I have added some recreation script. Let me know if it makes sense or if I need to add more information.

(David Pilato) #6

A match query works well.

DELETE index
PUT index/_doc/type_name
{
  "name": "cabin"
}
GET index/_search
{
  "query": {
    "match": {
      "name": {
        "query": "cabins",
        "fuzziness": 2
      }
    }
  }
}

If you want to use the query string query or the simple query string query, you need to use ~ to indicate that you want to apply fuzzy on a given term:

DELETE index
PUT index/_doc/type_name
{
  "name": "cabin"
}
GET index/_search
{
  "query": {
    "query_string": {
      "query": "cabins~"
    }
  }
}

I hope this helps.

2 Likes
#7

Wow, that's really helpful!

Thank you so much!

Btw, can I know what's the difference behind this, and also since the query_string also have a fuzziness configurations in the api.

(David Pilato) #8

I don't know. @jimczi?

(Jimferenczi) #9

The fuzziness option in the query_string is about changing the default value when you don't provide a numeric value after the ~ operator. The default is AUTO (see https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness)
but you can change the default or provide an explicit value (e.g. cabins~1).
Regarding the match query, since it doesn't accept operators the fuzziness option is applied to each term automatically.

1 Like
(David Pilato) #10

I thought that foo~ meant fuzziness: AUTO and foo~1 meant fuzziness: 1. Good to know. Thanks @jimczi.

#11

Good to know! thanks @jimczi and @dadoonet !!

(system) closed #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.