Edge_ngram results

I have an ElasticSearch string field configured for autocomplete like this:

    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym, name_synonyms, 

autocomplete_filter ]

    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms, 

ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other records
where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/218280b1-2c9c-42db-854d-62d1c8de8862%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Maybe you can enable explanations to see how scores are computed and what
the difference is between these records?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-explain.html

On Wed, Oct 1, 2014 at 12:24 PM, Lee Gee leegee@gmail.com wrote:

I have an ElasticSearch string field configured for autocomplete like this:

    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym,

name_synonyms, autocomplete_filter ]

    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms,

ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other
records where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/218280b1-2c9c-42db-854d-62d1c8de8862%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/218280b1-2c9c-42db-854d-62d1c8de8862%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6C5nvV-rzqWGQ%3DWxHYSyzX34t6qQXshomfh26p4nK_aA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

'explain' shows only two differences between the two results:

Hit on 'S' vs. hit on 'DqWjDCcsh S'

  • idf(docFreq=1, maxDocs=1) vs. idf(docFreq=10, maxDocs=10)

  • fieldNorm(doc=0) vs. fieldNorm(doc=9)

My possibly flawed understanding is that IDF is the inverse document
frequency of the search term across the whole index — what confuses me is
that these are results for the same term in the same index, so shouldn't
the IDF be the same...?

tia
lee

On Wednesday, October 1, 2014 11:24:17 AM UTC+1, Lee Gee wrote:

I have an ElasticSearch string field configured for autocomplete like this:

    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym, 

name_synonyms, autocomplete_filter ]

    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms, 

ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other
records where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/681ebe12-7cfa-4ed6-a045-ad287545d4eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

The problem was that my test script did not pause between
creating/populating the index, and searching on it. Even though there are
very few documents (10), ElasticSearch still needs a second or two to catch
its breath and mop its brow before it is ready to search.

Now to find a way to rank shorter strings higher than longer ones.... but
that's another question....

thanks
Lee

On Wednesday, October 1, 2014 11:24:17 AM UTC+1, Lee Gee wrote:

I have an ElasticSearch string field configured for autocomplete like this:

    autocomplete_analyzer:
      type: custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, ending_synonym, 

name_synonyms, autocomplete_filter ]

    autocomplete_filter:
      type: edge_ngram
      min_gram: 1
      max_gram: 20
      token_chars: [ letter, digit, whitespace, punctuation, symbol ]

    search_analyzer:
      type:     custom
      tokenizer: whitespace
      filter: [ lowercase, asciifolding, standard, name_synonyms, 

ending_synonym ]

I have a record where the field contains 'S XYZ', and lots of other
records where the field contains other words beginning S.

I do not understand why, when I search for 'S XYZ', it is not the first
result.

Could someone please explain ?

Many thanks in anticipation
lee

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c43961cb-224a-4b17-a03e-fc44926a05ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.