Search with exact match comes first?

I have some index settings like this

DELETE index
PUT index/_doc/Neighbour
{
   "id": "id1"
   "name": "Downtown San Francisco"
   "indexed_at": "2019-03-01"
}
PUT index/_doc/State
{
   "id": "id2"
   "name": "Nevada"
   "indexed_at": "2019-03-01"
}
PUT index/_doc/State
{
   "id": "id3"
   "name": "California"
   "indexed_at": "2019-03-01"
}
PUT index/_doc/City
{
   "id": "id4"
   "name": "San francisco"
   "indexed_at": "2019-03-01"
}

When I do the search with querystringquery, e.g.

GET index/_search
{
  "query": {
      "query_string": {
            "query": "San Francisco"
            "fuzziness": "AUTO"
            "fuzzy_prefix_length": 3
            "fuzzy_max_expansions": 10
     }
  }
}

it does not return San Francisco in the first place.

Do I need to do something specific to get the exact match comes first in the return results?

Thanks!

You can combine multiple criteria within a bool query inside should clauses. And use exact match then.

I wrote an example here:

Just curious to see why the querystring is not working as the bool query?

because I would like the short words as well as phrases to be acceptable as input. Therefore I think querystring is a quick way to solve?

Did you try with one single shard?

I index the content based on type like "City", "State", "Neighborhood". Are they falling into the single shard or different index will fall into different shards?

Try with one single shard or add the following option when calling the search API:

?search_type=dfs_query_then_fetch

See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

dfs_query_then_fetch would add another round to the elasticsearch, which might add up to the latency?

Does multiple indexes end up in different shards? I am still confused why it would happen if it is across different shards?

Sad that you don't want to try and check if that's solving or not the problem you described.

Anyway. In ES prior to 7.0, there are 5 shards by default. A document might go to one of those. Unless you are using the same routing key. The default routing key is the document id.

Oh I did try dfs_query_then_fetch before, this did make the results ranked more reasonably. Thought still for some use cases, exact match are not in the first place.

That's why I am eager to learn more behind the scenes and how can I get the better results :slight_smile:

Are you suggesting that I am storing all of the things into one document? Or stop using querystring?

Btw, just curious , is there a query builder that suits for both short text matches as well as phrases/paragraphs?

Are you suggesting that I am storing all of the things into one document? Or stop using querystring?

No. I'm suggesting that you either run your query on a big dataset and not on few documents or/and that you use only one shard for your index.

Btw, just curious , is there a query builder that suits for both short text matches as well as phrases/paragraphs?

No. You can use a bool query with multiple should clauses. One would be a match_phrase query, the other one a match query.

Thanks, that is helpful!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.