Multi match query with non fully analyzed fields

Hi,

We need to implement a free text search, so that a user can search a string and we need to return the docs which have this string in one of multiple fields..

So I've written a multi match, cross fields query on the required fields

Lets say the search is for "Alex"

and the query is:

        "multi_match": {
          "query": "Alex",
          "type": "cross_fields",
          "fields": [
            "name",
            "company",
            "email"
          ],

Now, name field is standard fully analyzed, company is not_analyzed (only exact matches should return) and email field is analyzed with tokenizer keyword and filter lowercase (I need exact non case sensitive match of email).

The weird behaviour is when I search for multiple terms which do not exist:

If I search for "Alex Facebook" - I expect to get all docs that the 3 fields above contain either "Alex" or "Facebook", and it there are no matches for "Facebook", I still expect to get the docs which match to "Alex".

But, if I search for a value which matches one of the not fully analyzed fields with another value which does not exists - I get no results.

Example:

Query: "Amazon James"
There is a doc which company = "Amazon", but there isn't any match for "James" - no result returns.

Can someone explain this behavior and how I can overcome it?

cross_fields don't work all that well when the fields have different analysers.

"If you include fields with a different analysis chain, they will be added to the query in the same way as for best_fields"
https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_cross_fields_queries.html

Given you have specified "not_analaysed" for "company", it will be using the full untokenized query and only match companies actually named "Amazon James".

Hi,

Thanks for the reply. Is there a nice way to achieve what I aiming for in one query?

I've thought about splitting the 'multi_match' queries between the different analysed fields and assembling them together under 'should'.

Yes, in a similar situation, I've seen a bool/should query work OK. But a problem you need to address first is that you will have trouble getting a string like "Amazon James" to match "Amazon" on a "not_analysed" field. You might get away with specifying a query time analyser consisting of a standard tokenizer and no token filters, but then no queries would match multi-word company names like "Mercedes Benze". There might be some clever things you could do using shingle token filters in the query analyser to work around this, but it depends on what your requirements really are.

One last thing, I've did some testing and with seems like using query_string instead of multi_match works on the search ''Amazon James" on a non-analyzed field where only Amazon matches..

Why is the difference in behavior between query_string and multi_match?

I haven't used query_string much, but I'd be interested in seeing some examples that you have managed to get working.

From what I understood, query_string by default splits the entire query to multiple terms by spaces and applies OR operator between them. So basically "Amazon James" is analyzed separately as "Amazon" and "James", for companies as Mercedes Benz they should be passed to the query as "Mercedes Benz" and then this will be the whole term.

Are you specifying specific fields in the query, or are you leaving it as the default "_all" field? I ask this because I suspect that everything in the "_all" field is analysed using the "standard" analyser, and so "Benze Mercedes" would also match "Mercedes Benze".

Specifying a closed list of fields.. In this example it would be ["name", "company", "email"]