Edge_ngram results

Lee_Gee · October 1, 2014, 10:24am

I have an ElasticSearch string field configured for autocomplete like this:

    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym, name_synonyms,

autocomplete_filter ]

    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms,

ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other records
where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/218280b1-2c9c-42db-854d-62d1c8de8862%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · October 1, 2014, 10:55pm

Maybe you can enable explanations to see how scores are computed and what
the difference is between these records?

On Wed, Oct 1, 2014 at 12:24 PM, Lee Gee leegee@gmail.com wrote:

I have an Elasticsearch string field configured for autocomplete like this:
    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym,
name_synonyms, autocomplete_filter ]
    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms,
ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other
records where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/218280b1-2c9c-42db-854d-62d1c8de8862%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/218280b1-2c9c-42db-854d-62d1c8de8862%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6C5nvV-rzqWGQ%3DWxHYSyzX34t6qQXshomfh26p4nK_aA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Lee_Gee · October 2, 2014, 8:41am

'explain' shows only two differences between the two results:

Hit on 'S' vs. hit on 'DqWjDCcsh S'

idf(docFreq=1, maxDocs=1) vs. idf(docFreq=10, maxDocs=10)
fieldNorm(doc=0) vs. fieldNorm(doc=9)

My possibly flawed understanding is that IDF is the inverse document
frequency of the search term across the whole index — what confuses me is
that these are results for the same term in the same index, so shouldn't
the IDF be the same...?

tia
lee

On Wednesday, October 1, 2014 11:24:17 AM UTC+1, Lee Gee wrote:

I have an Elasticsearch string field configured for autocomplete like this:
    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym, 
name_synonyms, autocomplete_filter ]
    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms, 
ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other
records where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/681ebe12-7cfa-4ed6-a045-ad287545d4eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lee_Gee · October 2, 2014, 9:13am

The problem was that my test script did not pause between
creating/populating the index, and searching on it. Even though there are
very few documents (10), Elasticsearch still needs a second or two to catch
its breath and mop its brow before it is ready to search.

Now to find a way to rank shorter strings higher than longer ones.... but
that's another question....

thanks
Lee

On Wednesday, October 1, 2014 11:24:17 AM UTC+1, Lee Gee wrote:

I have an Elasticsearch string field configured for autocomplete like this:
    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym, 
name_synonyms, autocomplete_filter ]
    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms, 
ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other
records where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c43961cb-224a-4b17-a03e-fc44926a05ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Synonym_filter and edge_ngram token filter not working together Elasticsearch	3	647	May 2, 2018
How to whitelist terms in a custom analyzer Elasticsearch	4	1113	August 10, 2017
AutoCompletion for Hindi (Indian Language) Elasticsearch	3	1032	April 17, 2020
Tokenizer: whitespace not working with edge_ngram Elasticsearch	9	2450	March 5, 2018
Elastic Edge_Ngram with Whitespace Word Breaker Elasticsearch	4	997	April 28, 2020

Edge_ngram results

Related topics