Exact Search on multiple wildcard-fields

Hello guys,

I have to do searches on fields with wildcards in it. Like for example:

{
   "query_string": {
      "fields": ["variations.generalData*", "variations.addresses*"],
      "query": "Müll"
   }
}

But I didn't found an equivalent of this "contains"-search. I know there are term-Queries which should only match the exact given string but I didn't found out how I could pass multiple fields with a wildcard to it.

Thanks in advance for your help!

It's better IMO to use a ngram based analyzer.

And with that I can do easy "contains" and "exact" queries? Can I combine this analyzer with the default language analyzers?

You will need a mix of analyzers for the same "root" field.

For example:

https://github.com/dadoonet/legacy-search/blob/03-mapping/src/main/resources/elasticsearch/person/_settings.json#L35-L47

Then you can boost some queries (ie the exact match) if needed.

Can I combine this analyzer with the default language analyzers?

Yes and no. You will need to find what is the lang analyzer you want to use made with (ie the french one: Language analyzers | Elasticsearch Guide [8.11] | Elastic) and adapt it to your need as a custom analyzer.

Wow. Are these the standard analyzers? Do I really need "stop", "keywords" and "stemmer" if I want them to work like the default language analyzers? Because my product can be used in many languages, I would need to save all these analyzers listed there...

And even if I would use the ngram-analyzer I still don't find a query where I can hand over multiple wildcard-fields but an exact-query. Do you have one in mind?

Not exactly but something like this:

Or with the mapping I shared previously here is what you can write in Java:

I am sorry I still don't get it. Wouldn't that only boost the importance of one field and not the matching of the query-string? I tried the "fuzziness" paramter in a Query String Query by now but that still finds "Müller" when I want to search exactly for "Müll" =/

It should find both but with a preference for the exact one.

Hm well maybe I should have added: That value is most likely to occur only in one field. I only pass multiple fields because I cannot be sure in which field the value is saved exactly.

I just wanted to post an update: I thought a solution would be to search on the "keyword" field, yet it doesn't work when the field has a space in it. Is it possible to do this somehow?

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Here is the recration script:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "path.test.1": {
          "type": "text",
          "analyzer": "standard"
        },
        "path.example.1": {
          "type": "text",
          "analyzer": "standard"
        }
      }
    }
  }
}
POST index/doc
{
  "path.test.1": "basket"
}
POST index/doc
{
  "path.example.1": "basketball"
}
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": ["path.test.*", "path.example.*"],
      "query": "basket"
    }
  }
}

What I am searching for now is a query that returns EXACTLY "basket" and where I can pass multiple field-names with wildcards in them (because under "path.test" can be multiple fields)

The reproduction script you shared exactly returns the document that contains exactly basket. And you have one single field in your example.
So I don't know what you are looking for.

If I run this in an elastic search verison 5.6.4 I get both "basket" and "baskteball" as return.

I tried to keep the example as short as possible and only wanted to remind that I need to pass "fields" and not just a single "field" to the query, because there are some queries that only allow one field, like the prefix-query for example. I will update my previous post to better demonstrate this.

OK here is an updated version:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "text",
          "analyzer": "standard",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
POST index/doc
{
  "text": "basket"
}
POST index/doc
{
  "text": "basketball"
}
POST index/doc
{
  "text": "basket of apples"
}
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "tex*.keyword"
      ],
      "query": "basket"
    }
  }
}

I thought maybe using the keyword field would do the work, but it doesn't work with "basket of apples". When searching for this exact value, "basket" is returned instead of the expected doc.

I think I'm lost.

You are searching for exact match as you want to search within a keyword type field.
You are searching for basket and only document which match exactly basket is returned.

So I'm not sure about what you are searching for.
If I'm understanding, searching for basket should return basket and basket of apples? Is that right?

If so, why not running:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "text",
          "fields": {
            "ngrams": {
              "type": "text",
              "analyzer": "standard"
            },
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
POST index/doc
{
  "text": "basket"
}
POST index/doc
{
  "text": "basketball"
}
POST index/doc
{
  "text": "basket of apples"
}
# Partial match
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "text"
      ],
      "query": "basket"
    }
  }
}
# exact match
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "text.keyword"
      ],
      "query": "basket"
    }
  }
}
# Sub terms match
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "text.ngrams"
      ],
      "query": "basket"
    }
  }
}

If I search for "basket" I want to find exactly "basket" and no other entry. That works fine but it doesn't work for spaces:

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      },
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "text": {
          "type": "text",
          "fields": {
            "ngrams": {
              "type": "text",
              "analyzer": "standard"
            },
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}
POST index/doc
{
  "text": "basket"
}
POST index/doc
{
  "text": "basketball"
}
POST index/doc
{
  "text": "basket of apples"
}
# Should return "basket of apples" instead returns "basket" =(
GET /index/_search
{
  "query": {
    "query_string": {
      "fields": [
        "tex*.keyword"
      ],
      "query": "basket of apples"
    }
  }
}

I tried your script and this is giving me:

{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "index",
        "_type": "doc",
        "_id": "8DF3FGUBsarWKLmEOFHS",
        "_score": 0.2876821,
        "_source": {
          "text": "basket of apples"
        }
      }
    ]
  }
}

Which is what you are expecting, right?

Yeah. Which elastic search version do you use?

6.3.2