Exact Search on multiple wildcard-fields


#5

Wow. Are these the standard analyzers? Do I really need "stop", "keywords" and "stemmer" if I want them to work like the default language analyzers? Because my product can be used in many languages, I would need to save all these analyzers listed there...

And even if I would use the ngram-analyzer I still don't find a query where I can hand over multiple wildcard-fields but an exact-query. Do you have one in mind?


(David Pilato) #6

Not exactly but something like this:

Or with the mapping I shared previously here is what you can write in Java:


#7

I am sorry I still don't get it. Wouldn't that only boost the importance of one field and not the matching of the query-string? I tried the "fuzziness" paramter in a Query String Query by now but that still finds "Müller" when I want to search exactly for "Müll" =/


(David Pilato) #8

It should find both but with a preference for the exact one.


#9

Hm well maybe I should have added: That value is most likely to occur only in one field. I only pass multiple fields because I cannot be sure in which field the value is saved exactly.


#11

I just wanted to post an update: I thought a solution would be to search on the "keyword" field, yet it doesn't work when the field has a space in it. Is it possible to do this somehow?


(David Pilato) #12

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


#13

Here is the recration script:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "path.test.1": {
          "type": "text",
          "analyzer": "standard"
        },
        "path.example.1": {
          "type": "text",
          "analyzer": "standard"
        }
      }
    }
  }
}
POST index/doc
{
  "path.test.1": "basket"
}
POST index/doc
{
  "path.example.1": "basketball"
}
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": ["path.test.*", "path.example.*"],
      "query": "basket"
    }
  }
}

What I am searching for now is a query that returns EXACTLY "basket" and where I can pass multiple field-names with wildcards in them (because under "path.test" can be multiple fields)


(David Pilato) #14

The reproduction script you shared exactly returns the document that contains exactly basket. And you have one single field in your example.
So I don't know what you are looking for.


#15

If I run this in an elastic search verison 5.6.4 I get both "basket" and "baskteball" as return.

I tried to keep the example as short as possible and only wanted to remind that I need to pass "fields" and not just a single "field" to the query, because there are some queries that only allow one field, like the prefix-query for example. I will update my previous post to better demonstrate this.


#16

OK here is an updated version:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "standard",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
POST index/doc
{
  "text": "basket"
}
POST index/doc
{
  "text": "basketball"
}
POST index/doc
{
  "text": "basket of apples"
}
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "tex*.keyword"
      ],
      "query": "basket"
    }
  }
}

I thought maybe using the keyword field would do the work, but it doesn't work with "basket of apples". When searching for this exact value, "basket" is returned instead of the expected doc.


(David Pilato) #17

I think I'm lost.

You are searching for exact match as you want to search within a keyword type field.
You are searching for basket and only document which match exactly basket is returned.

So I'm not sure about what you are searching for.
If I'm understanding, searching for basket should return basket and basket of apples? Is that right?

If so, why not running:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "text",
          "fields": {
            "ngrams": {
              "type": "text",
              "analyzer": "standard"
            },
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
POST index/doc
{
  "text": "basket"
}
POST index/doc
{
  "text": "basketball"
}
POST index/doc
{
  "text": "basket of apples"
}
# Partial match
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "text"
      ],
      "query": "basket"
    }
  }
}
# exact match
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "text.keyword"
      ],
      "query": "basket"
    }
  }
}
# Sub terms match
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "text.ngrams"
      ],
      "query": "basket"
    }
  }
}

#18

If I search for "basket" I want to find exactly "basket" and no other entry. That works fine but it doesn't work for spaces:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "text",
          "fields": {
            "ngrams": {
              "type": "text",
              "analyzer": "standard"
            },
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
POST index/doc
{
  "text": "basket"
}
POST index/doc
{
  "text": "basketball"
}
POST index/doc
{
  "text": "basket of apples"
}
# Should return "basket of apples" instead returns "basket" =(
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "tex*.keyword"
      ],
      "query": "basket of apples"
    }
  }
}

(David Pilato) #19

I tried your script and this is giving me:

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "index",
        "_type": "doc",
        "_id": "8DF3FGUBsarWKLmEOFHS",
        "_score": 0.2876821,
        "_source": {
          "text": "basket of apples"
        }
      }
    ]
  }
}

Which is what you are expecting, right?


#20

Yeah. Which elastic search version do you use?


(David Pilato) #21

6.3.2


#22

OK then it seems to be a problem with 5.6.4 or even any 5.6.x version. I will try it out.

Are there any other possibilities to do an exact search then?


(David Pilato) #23

This works on 5.x:

GET /index/_search
{
  "query": {
    "match": {
      "text.keyword": "basket of apples"
    }
  }
}

#24

Ah yeah, I didn't try out the match-query, thank you very much! I just made a little twist and use the "multi_match" query because I need to enter multiple fields with wildcard in the fieldname, a simple example:

GET /_search
{
  "query": {
    "multi_match" : {
      "query": "basket of apples", 
      "fields": [ "tex*.keyword"] 
    }
  }
}

Works like a charm!


(system) closed #25

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.