Elasticsearch phrase suggester


(mstrasser) #1
    <p>I would like to use "<a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html" rel="nofollow">Phrase Suggester</a>". I've got a problem. When typing "johni depp",

it returns several results in this order:

  1. john depp
  2. johnny depp
  3. joann depp
  4. johnn depp

How can I sort the suggestions using json so that the first result is "johnny depp"? I've tried doing this with a phonetic indexer, but without success.

This is my configuration:

Query :

{
  "query": {
    "multi_match": {
      "query": "johni depp",
      "fields": [
        "fullName.word"
      ],
      "operator": "and"
    }
  },
  "suggest": {
    "text": "johni depp",
    "film": {
      "phrase": {
        "analyzer": "whitespace-fullName",
        "field": "fullName.whitespace",
        "size": 5,
        "real_word_error_likelihood": 0.95,
        "max_errors": 0.5,
        "gram_size": 2
      }
    }
  },
  "from": 0,
  "size": 1,
  "sort": [],
  "facets": []
}

Indexer (I use Elastica, but it's same thing):

$elasticaIndex->create(
              array(
                  'number_of_shards'   => 4,
                  'number_of_replicas' => 1,
                  'analysis'           => array(
                      'analyzer' => array(
                          'autocomplete-index-fullName'  => array(
                              'tokenizer' => 'standard',
                              'filter'    => 'asciifolding, lowercase, edgeNGram'
                          ),
                          'autocomplete-search-fullName' => array(
                              'tokenizer' => 'standard',
                              'filter'    => 'asciifolding, lowercase'
                          ),
                          'word-fullName'                => array(
                              'tokenizer' => 'keyword',
                              'filter'    => 'lowercase'
                          ),
                          'whitespace-fullName'          => array(
                              'tokenizer' => 'whitespace',
                              'filter'    => 'lowercase'
                          ),
                      ),
                      'filter'   => array(
                          'edgeNGram' => array(
                              'type'     => 'edgeNGram',
                              'min_gram' => 1,
                              'max_gram' => 15
                          )
                      )
                  )
              ),
              false
);

Mapping :

$mapping->setProperties(
        array(
            'fullName' => array('type'   => 'string',
                                'fields' => array(
                                    'autocomplete' => array(
                                        'type'            => 'string',
                                        'index_analyzer'  => 'autocomplete-index-fullName',
                                        'search_analyzer' => 'autocomplete-search-fullName'
                                    ),
                                    'word'         => array(
                                        'type'            => 'string',
                                        'analyzer'  => 'word-fullName'
                                    ),
                                    'whitespace'   => array(
                                        'type'            => 'string',
                                        'analyzer'  => 'whitespace-fullName'
                                    ),
                                )),
        )
);

Examples of referenced values:

  • John Cleese
  • John Gemberling
  • Johnny Hallyday
  • Johnny Depp
  • Joann Sfar
  • Joanna Rytel
  • Samuel Johnson
  • Johnson TraorĂ©

Thanks in advance.


(Adrien Grand) #2

Hi,

You would get better results with the phrase suggester by using an analyzer
that has a shingle filter[1]. This filter will help Elasticsearch compute
frequencies of pairs of terms. Without it, it falls back to single-term
frequencies, so in your case it would only consider the frequencies of
'Johnny' and 'Depp' instead of the frequency of 'Johnny Depp' as a whole.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html

On Wed, Apr 9, 2014 at 5:57 PM, mstrasser contact@mathiasstrasser.comwrote:

I would like to use "Phrase Suggesterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html".
I've got a problem. When typing "johni depp", it returns several results in
this order:

  1. john depp
  2. johnny depp
  3. joann depp
  4. johnn depp

How can I sort the suggestions using json so that the first result is
"johnny depp"? I've tried doing this with a phonetic indexer, but
without success.

This is my configuration:

Query :

{
"query": {
"multi_match": {
"query": "johni depp",
"fields": [
"fullName.word"
],
"operator": "and"
}
},
"suggest": {
"text": "johni depp",
"film": {
"phrase": {
"analyzer": "whitespace-fullName",
"field": "fullName.whitespace",
"size": 5,
"real_word_error_likelihood": 0.95,
"max_errors": 0.5,
"gram_size": 2
}
}
},
"from": 0,
"size": 1,
"sort": [],
"facets": []
}

Indexer (I use Elastica, but it's same thing):

$elasticaIndex->create(
array(
'number_of_shards' => 4,
'number_of_replicas' => 1,
'analysis' => array(
'analyzer' => array(
'autocomplete-index-fullName' => array(
'tokenizer' => 'standard',
'filter' => 'asciifolding, lowercase, edgeNGram'
),
'autocomplete-search-fullName' => array(
'tokenizer' => 'standard',
'filter' => 'asciifolding, lowercase'
),
'word-fullName' => array(
'tokenizer' => 'keyword',
'filter' => 'lowercase'
),
'whitespace-fullName' => array(
'tokenizer' => 'whitespace',
'filter' => 'lowercase'
),
),
'filter' => array(
'edgeNGram' => array(
'type' => 'edgeNGram',
'min_gram' => 1,
'max_gram' => 15
)
)
)
),
false
);

Mapping :

$mapping->setProperties(
array(
'fullName' => array('type' => 'string',
'fields' => array(
'autocomplete' => array(
'type' => 'string',
'index_analyzer' => 'autocomplete-index-fullName',
'search_analyzer' => 'autocomplete-search-fullName'
),
'word' => array(
'type' => 'string',
'analyzer' => 'word-fullName'
),
'whitespace' => array(
'type' => 'string',
'analyzer' => 'whitespace-fullName'
),
)),
)
);

Examples of referenced values:

  • John Cleese
  • John Gemberling
  • Johnny Hallyday
  • Johnny Depp
  • Joann Sfar
  • Joanna Rytel
  • Samuel Johnson
  • Johnson TraorĂ©

Thanks in advance.


View this message in context: Elasticsearch phrase suggesterhttp://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-phrase-suggester-tp4053844.html
Sent from the ElasticSearch Users mailing list archivehttp://elasticsearch-users.115913.n3.nabble.com/at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1397059055130-4053844.post%40n3.nabble.comhttps://groups.google.com/d/msgid/elasticsearch/1397059055130-4053844.post%40n3.nabble.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4Hew1g%2BYR5%2BEh1kC%3DwUdvQGg-FGPsvbA8C942mtpJtoA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Nik Everett) #3

Set the min to 2 and max to either 2 or 3 and emit unigrams. That'll get it working. Also you probably want to set max errors to a number > 1 because .5 means 50% which can get quite large if someone types 20 terms.

That's been my experience with it.

Nik

Sent from my iPhone

On Apr 10, 2014, at 7:46 PM, Adrien Grand adrien.grand@elasticsearch.com wrote:

Hi,

You would get better results with the phrase suggester by using an analyzer that has a shingle filter[1]. This filter will help Elasticsearch compute frequencies of pairs of terms. Without it, it falls back to single-term frequencies, so in your case it would only consider the frequencies of 'Johnny' and 'Depp' instead of the frequency of 'Johnny Depp' as a whole.

[1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html

On Wed, Apr 9, 2014 at 5:57 PM, mstrasser contact@mathiasstrasser.com wrote:
I would like to use "Phrase Suggester". I've got a problem. When typing "johni depp", it returns several results in this order:

john depp
johnny depp
joann depp
johnn depp
How can I sort the suggestions using json so that the first result is "johnny depp"? I've tried doing this with a phonetic indexer, but without success.

This is my configuration:

Query :

{
"query": {
"multi_match": {
"query": "johni depp",
"fields": [
"fullName.word"
],
"operator": "and"
}
},
"suggest": {
"text": "johni depp",
"film": {
"phrase": {
"analyzer": "whitespace-fullName",
"field": "fullName.whitespace",
"size": 5,
"real_word_error_likelihood": 0.95,
"max_errors": 0.5,
"gram_size": 2
}
}
},
"from": 0,
"size": 1,
"sort": [],
"facets": []
}
Indexer (I use Elastica, but it's same thing):

$elasticaIndex->create(
array(
'number_of_shards' => 4,
'number_of_replicas' => 1,
'analysis' => array(
'analyzer' => array(
'autocomplete-index-fullName' => array(
'tokenizer' => 'standard',
'filter' => 'asciifolding, lowercase, edgeNGram'
),
'autocomplete-search-fullName' => array(
'tokenizer' => 'standard',
'filter' => 'asciifolding, lowercase'
),
'word-fullName' => array(
'tokenizer' => 'keyword',
'filter' => 'lowercase'
),
'whitespace-fullName' => array(
'tokenizer' => 'whitespace',
'filter' => 'lowercase'
),
),
'filter' => array(
'edgeNGram' => array(
'type' => 'edgeNGram',
'min_gram' => 1,
'max_gram' => 15
)
)
)
),
false
);
Mapping :

$mapping->setProperties(
array(
'fullName' => array('type' => 'string',
'fields' => array(
'autocomplete' => array(
'type' => 'string',
'index_analyzer' => 'autocomplete-index-fullName',
'search_analyzer' => 'autocomplete-search-fullName'
),
'word' => array(
'type' => 'string',
'analyzer' => 'word-fullName'
),
'whitespace' => array(
'type' => 'string',
'analyzer' => 'whitespace-fullName'
),
)),
)
);
Examples of referenced values:

John Cleese
John Gemberling
Johnny Hallyday
Johnny Depp
Joann Sfar
Joanna Rytel
Samuel Johnson
Johnson Traoré
Thanks in advance.

View this message in context: Elasticsearch phrase suggester
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1397059055130-4053844.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4Hew1g%2BYR5%2BEh1kC%3DwUdvQGg-FGPsvbA8C942mtpJtoA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3A3B737F-528D-4DDC-A674-256F5231E349%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4