Elastic does not see the difference between phrases in search

I search for "akg b 6200" and it returns record with "ghj b 7876", "nma b987"... and so on.
Searching engine works in this product?

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

I create index (all in Kibana):

PUT products
{
"settings" : {
"index" : {
"number_of_shards" : 3,
"number_of_replicas" : 2
}
}
}

I do mappings:

PUT products/_mapping/p
{
"properties": {
"product_name": {"type": "text"},
}
}

I insert several products:

POST /products/p/
{
"product_name" : "akg b 986"
}
...
Then I search:
GET /products/_search
{
"query": { "match": { "product_name": "whatever here" } }
}

Returns all documents...
Tried also 'multi_match', 'match_all'...

If it only finds one single letter... it returns document as a result of search. More, score is absolutely wrong - no logic in that.

Can you tell what you put in "whatever here"?
And why do you think scoring is wrong?

I can type 'dresg b 1234' or 'jkod b9854' and elastic will return docs. It should not. The ONLY common thing is letter 'b'. Not enough to return as positive search. Score is completely wrong. For instance if the product_name is '"akg b 986" and you search for 'akg b986' and you search for 'akg b 986' the first doc 9'akg b986') can receive... better score :slight_smile:

Looks like Elastic use primitive '%X%' search.

I would recommend using the analyze API to see exactly what Elasticsearch indexes behind the scenes. Also be aware that scoring will depend on the contents and distribution of data in the shard where the document is found. If you have more than one shard for an index, scoring for the same match may therefore differ depending on in which shard the document is located.

Thanks I but get nothing. I do not want to analyze. I just want to search. Should I create some special index where searching really works (by default it does not - examples above)? Or search query has to have some magical commands?
I think my example is really ABC - and all guys are happy with such 'search'?

With default analyzer b9854 is probably indexed as b and 9854.

That's why this match. Change the analyzer depending on your use case

1 Like

So by default Elastic search is not working? :slight_smile:
Hard to say that search is doing job if it respond to any string query match...

I do not know what you are taking about. Any example?
When I created mapping:

{
"mappings": {
"products": {
"_all": {
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "keyword_analyzer"
},
"properties": {
"product_name": {
"type": "text",
"search_analyzer": "keyword_analyzer",
"analyzer": "edge_ngram_analyzer"
}
}
}
}

Result of search was also bad. Returned all document where only one single letter matched.

It really depends on your use case... I mean... Really.

So you need to adapt the search engine for your needs.
If you don't, Elasticsearch is trying to guess what you want to do, but that's just a guess.

I mean: are 'jkod b9854 and dresg b 1234 real terms you want to index?
Do you want to search for exact match "a la SQL"?

If so, index your documents with this mapping:

{
  "properties": {
    "product_name": {"type": "keyword"}
  }
}

As @Christian_Dahlqvist says (and you should always follow Christian's advices :slight_smile: ), the _analyze API is super useful to understand what is happening at index time and at search time.

Thanks but we're spinning in circle. You can help me? If you only know the subject pls give me example of working solution. Christian's advice to analyze what software is doing is useless for me. Maybe one day where I get deeper. More, I do not understand why search result should depend the shard they are coming from - this is still same data set does not matter where logically or physically is. No of shards should not have any influence on scoring. This is still one database.

I not not want exact much. I expected that this product has some advanced searching engine (like they promote it 'it's just search'). What I see it finds... anything. Does not matter the accuracy.

"Are 'jkod b9854 and dresg b 1234 real terms you want to index?"

YES they are. You can imagine 'mkpl c890', 'hasf c 4367p'. And I need software that is able to distinguish that even if there is 'c' i the phrase this is not enough to return it as a search result' - if I search for 'mkpl c 890' do not return second doc - this is completely different phrase). By deafault this software returns all. It just does not search.

If you don't like that the standard analyzer generates mkpl, c and 890 because it's your use case, you need to use another analyzer.

That's exactly where the _analyze API plays a great role again. To build the custom analyzer that will suit your needs.

You have an example on the page I linked to.

Note that one of the reason, the standard analyzer is splitting your text c980 as c and 980 is when you are indexing things like 180EUR for example. That should be split in 180 and EUR.

More, I do not understand why search result should depend the shard they are coming from.
No of shards should not have any influence on scoring. This is still one database.

It's because you don't understand how elasticsearch works. Each shard is a Lucene instance. You can think as a Lucene instance as one database if you wish. So when you have 5 shards, you have actually 5 databases.

You have then 2 options:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.