What is the best way to query first and last name?


(undefinedman) #1

I am trying to setup search analysis in Elasticsearch to get as close as possible to the following scenario:

I have 4 users:

  1. Karolina Abc (karolinaabc@example.com)
  2. Karolina Def (karolinadef@example.com)
  3. Karol Abc (karolabc@example.com)
  4. Karol Adf (karoladf@example.com)

When I type:
"Kar"

I should get all 4 users.


When I type:
"Karol"

I still should get all 4 users.


When I type:
"Karol A"

I should get user no. 3 & 4.


When I type:
"Karol Ad"

I should get user no. 4 only.

So when I start typing it should start matching let's say first 3 chars for field firstName, lastName and emailAddress and for that I use MultiMatch. The result should narrow once I type more chars and in the end return only ones that matches my query string.

The problem is that when I type now "Karol Ad" it returns all 4 users because "Karol" matches "Karolina".

I have the following setup:

        settings:
            index:
                analysis:
                    analyzer:
                        search_analyzer:
                            type: custom
                            tokenizer: standard
                            filter: [lowercase, edge_ngram]
                    filter:
                        edge_ngram:
                            type: edge_ngram
                            min_gram: 1
                            max_gram: 10

and

    $boolQuery = new Query\BoolQuery();

    $boolQuery->addMust((new Query\MultiMatch())
        ->setFields(['firstName', 'lastName', 'emailAddress'])
        ->setType(Query\MultiMatch::TYPE_CROSS_FIELDS)
        ->setQuery('Karol Ad')
        ->setOperator('and')
    );

    $query = new Query();
    $query->setQuery($boolQuery);

    $result = $this->finder->find($query);

Is there anyone that may know the solution to the problem I am facing?


#2

Can we see the mapping of the fields as well pls?
Because your approach looks fine...
Can you confirm that you have a "search_analyzer": "standard" in your fields mapping?

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/search-analyzer.html#search-analyzer


(undefinedman) #3

Hey klof, thank you for your answer.

I think I found the solution. In my example above I use edge_ngram which is causing that "Karol" is found in "Karolina" word. I have changed my approach a bit.

Let's take a look at this:

         settings:
            index:
                analysis:
                    analyzer:
                        search_analyzer:
                            type: custom
                            tokenizer: keyword
                            filter: [lowercase]

As you can see, I got rid of edge_ngram filter and i removed standard tokenizer and implemented keyword tokenizer instead.

Surely I have mapped search_analyzer to my fields.

Now, PHP part:

    $boolQuery->addMust((new Query\MultiMatch())
        ->setFields(['fullName', 'lastName', 'emailAddress'])
        ->setType(Query\MultiMatch::TYPE_PHRASE_PREFIX)
        ->setQuery($this->query)
    );

I got rid of TYPE_CROSS_FIELDS and added TYPE_PHRASE_PREFIX instead.

And finally it works the way I want. So when I type "Karol A" I don't get other results anymore.

There is a minor issue only. When subject has two words, e.g. "Hello world" it will not match when I type "word".


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.