N edge gram analyzer's behave not as expected

narinder_izap · March 25, 2015, 8:49am

Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation
was it should analyze the text based on edges. So as per my understanding,
the analysis of a multi word like (Narinder Kaur)term will give
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "narinder",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

OR

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "kaur",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

both should have searched for the documents containing "Narinder Kaur". But
currently I can not search for kaur. Its working only for first term match.
The analyzer's used are as followed:

analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}

Please elaborate how its not working as expected? and what should I do to
make my requirement work without re-indexing the data.

All help is appreciated.
thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

masaru · March 26, 2015, 11:36pm

Hi,

You'd need to specify token_chars when you configure edge ngram tokenizer(http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html). Unless, all characters are kept. Which means, words are not split on white spaces.
You can see how the analyzer works by _analyze API(http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html)

You need to fix analyzer and re-index all documents.

Masaru

On March 25, 2015 at 17:49:24, Narinder Kaur (narinder.kaur@izap.in) wrote:

Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation was it should analyze the text based on edges. So as per my understanding, the analysis of a multi word like (Narinder Kaur)term will give
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "narinder",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

OR

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "kaur",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

both should have searched for the documents containing "Narinder Kaur". But currently I can not search for kaur. Its working only for first term match. The analyzer's used are as followed:

analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}

Please elaborate how its not working as expected? and what should I do to make my requirement work without re-indexing the data.

All help is appreciated.
thanks

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5514982b.79e2a9e3.166%40citra-2.local.
For more options, visit https://groups.google.com/d/optout.

narinder_izap · March 27, 2015, 4:38am

Thanks for your reply. It much better clear now how to

On Friday, 27 March 2015 05:07:34 UTC+5:30, Masaru Hasegawa wrote:

Hi,

You'd need to specify token_chars when you configure edge ngram tokenizer(
Edge n-gram tokenizer | Elasticsearch Guide [8.11] | Elastic).
Unless, all characters are kept. Which means, words are not split on white
spaces.
You can see how the analyzer works by _analyze API(
Analyze API | Elasticsearch Guide [8.11] | Elastic
)

You need to fix analyzer and re-index all documents.

Masaru

On March 25, 2015 at 17:49:24, Narinder Kaur (narind...@izap.in
<javascript:>) wrote:

Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation
was it should analyze the text based on edges. So as per my understanding,
the analysis of a multi word like (Narinder Kaur)term will give
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "narinder",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

OR

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "kaur",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

both should have searched for the documents containing "Narinder Kaur".
But currently I can not search for kaur. Its working only for first term
match. The analyzer's used are as followed:

analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}

Please elaborate how its not working as expected? and what should I do to
make my requirement work without re-indexing the data.

All help is appreciated.
thanks

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54f7657f-ecb2-459f-8947-913a678745b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

narinder_izap · March 27, 2015, 4:39am

thanks for reply. I will try it.

On Friday, 27 March 2015 05:07:34 UTC+5:30, Masaru Hasegawa wrote:

Hi,

You'd need to specify token_chars when you configure edge ngram tokenizer(
Edge n-gram tokenizer | Elasticsearch Guide [8.11] | Elastic).
Unless, all characters are kept. Which means, words are not split on white
spaces.
You can see how the analyzer works by _analyze API(
Analyze API | Elasticsearch Guide [8.11] | Elastic
)

You need to fix analyzer and re-index all documents.

Masaru

On March 25, 2015 at 17:49:24, Narinder Kaur (narind...@izap.in
<javascript:>) wrote:

Hi All,

I have an custom analyzer based on n edge gram analyzer, The expectation
was it should analyze the text based on edges. So as per my understanding,
the analysis of a multi word like (Narinder Kaur)term will give
N
Na
Nar
Nari
Narin
Narind
Narinde
Narinder
K
Ka
Kau
Kaur

So now if search for narinder or kaur by using the following query:

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "narinder",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

OR

{
"query": {
"constant_score": {
"query": {
"match_phrase_prefix": {
"primary_search_new": {
"query": "kaur",
"analyzer": "ys_search_analyzer_long"
}
}
}
}
}
}

both should have searched for the documents containing "Narinder Kaur".
But currently I can not search for kaur. Its working only for first term
match. The analyzer's used are as followed:

analysis: {
analyzer: {
ys_search_analyzer: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer
}
ys_search_analyzer_long: {
type: custom
filter: [
ys_word_delimiter
trim
lowercase
]
tokenizer: ys_edge_ngram_tokenizer_long
}
}
filter: {
ys_word_delimiter: {
type: word_delimiter
stem_english_possessive: False
}
}
tokenizer: {
ys_edge_ngram_tokenizer_long: {
type: edgeNGram
min_gram: 1
max_gram: 60
}
ys_edge_ngram_tokenizer: {
min_gram: 1
type: edgeNGram
max_gram: 20
}
}
}

Please elaborate how its not working as expected? and what should I do to
make my requirement work without re-indexing the data.

All help is appreciated.
thanks

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/76753cd7-7a47-4ca3-ba7b-90be025386b4%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3ceb9a8f-9846-4779-81ed-8cfc4bb07847%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

N edge gram analyzer's behave not as expected

All help is appreciated. thanks

All help is appreciated. thanks

All help is appreciated. thanks

All help is appreciated.
thanks

All help is appreciated.
thanks

All help is appreciated.
thanks