Highlighter problem


(paul) #1

I am trying out Highlighter feature of elastic-search. the text marked in
yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"syns_filter",
"my_edgeNgram"
]
},

My query:
{
"fields": [
"name"
],
"query": {
"match": {
"name": "univ"
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"name": {}
}
}
}

Results:

{
fields:{
name:SUNY Binghamton University
} highlight:{
name:[
SUNY Binghamton University
]
}
}

{
fields:{
name:Arizona State University
} highlight:{
name:[
Arizona State University
]
}
}

{
fields:{
name:Ohio State University
} highlight:{
name:[
Ohio State University
]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?

On Mon, Dec 16, 2013 at 7:28 AM, paul avinashpaul85@gmail.com wrote:

I am trying out Highlighter feature of elastic-search. the text marked in
yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"syns_filter",
"my_edgeNgram"
]
},

My query:
{
"fields": [
"name"
],
"query": {
"match": {
"name": "univ"
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"name": {}
}
}
}

Results:

{
fields:{
name:SUNY Binghamton University
} highlight:{
name:[
SUNY Binghamton University
]
}
}

{
fields:{
name:Arizona State University
} highlight:{
name:[
Arizona State University
]
}
}

{
fields:{
name:Ohio State University
} highlight:{
name:[
Ohio State University
]
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j48z-DWMmN3cRgNva93WZjxyRj1d-G%3D9fkTkd_Zn1Koaw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(paul) #3

Sure Adrien below is my definitions ,

"filter":{
"syns_filter":{
"synonyms_path":"synonyms/synonym_collegename.txt",
"type":"synonym",
"ignore_case":true
},
"my_edgeNgram":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":10
}
}
}

Regards
Paul

On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:

Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?

On Mon, Dec 16, 2013 at 7:28 AM, paul <avinas...@gmail.com <javascript:>>wrote:

I am trying out Highlighter feature of elastic-search. the text marked in
yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"syns_filter",
"my_edgeNgram"
]
},

My query:
{
"fields": [
"name"
],
"query": {
"match": {
"name": "univ"
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"name": {}
}
}
}

Results:

{
fields:{
name:SUNY Binghamton University
} highlight:{
name:[
SUNY Binghamton University
]
}
}

{
fields:{
name:Arizona State University
} highlight:{
name:[
Arizona State University
]
}
}

{
fields:{
name:Ohio State University
} highlight:{
name:[
Ohio State University
]
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ab57905-1533-4c4a-81e3-b370d0dced7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #4

I think the answer is in the content the content of the synonyms file. For
example if there is an entry in this file that looks like "Binghamton,
Binghamton University", in the end the analyzer is going to produce
something like "b", "bi", ..., "bing", ..., "u", "un", ..., "univ", ... for
a token whose term is "Binghamton". So if you search for "univ", it is
actually going to highlight the "bing" of "Binghamton".

I don't think there is a simple solution to your problem. Since you seem to
be using this index for auto-completion purposes, maybe a better option
would be to not use synonyms in the analyzer but to add a separate document
for every synonym.

On a side note, since you are doing auto-completion, maybe you could have a
look at the completion suggester[1]. Although it doesn't support
highlighting, I would expect it to be an order of magnitude faster than
index-based autocompletion so this might be worth checking out.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

On Tue, Dec 17, 2013 at 6:03 AM, paul avinashpaul85@gmail.com wrote:

Sure Adrien below is my definitions ,

"filter":{
"syns_filter":{
"synonyms_path":"synonyms/synonym_collegename.txt",
"type":"synonym",
"ignore_case":true
},
"my_edgeNgram":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":10
}
}
}

Regards
Paul

On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:

Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?

On Mon, Dec 16, 2013 at 7:28 AM, paul avinas...@gmail.com wrote:

I am trying out Highlighter feature of elastic-search. the text marked
in yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"syns_filter",
"my_edgeNgram"
]
},

My query:
{
"fields": [
"name"
],
"query": {
"match": {
"name": "univ"
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"name": {}
}
}
}

Results:

{
fields:{
name:SUNY Binghamton University
} highlight:{
name:[
SUNY Binghamton University
]
}
}

{
fields:{
name:Arizona State University
} highlight:{
name:[
Arizona State University
]
}
}

{
fields:{
name:Ohio State University
} highlight:{
name:[
Ohio State University
]
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ab57905-1533-4c4a-81e3-b370d0dced7e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7L3r%2BSYQV8YPJa0DRDShBR8cd8u687K0HD%3DYmtg981YQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(paul) #5

Thank you Adrien , will read search-suggesters-completion and see whether
it suits my requirement.

Regards
Paul

On Tue, Dec 17, 2013 at 12:54 PM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

I think the answer is in the content the content of the synonyms file. For
example if there is an entry in this file that looks like "Binghamton,
Binghamton University", in the end the analyzer is going to produce
something like "b", "bi", ..., "bing", ..., "u", "un", ..., "univ", ... for
a token whose term is "Binghamton". So if you search for "univ", it is
actually going to highlight the "bing" of "Binghamton".

I don't think there is a simple solution to your problem. Since you seem
to be using this index for auto-completion purposes, maybe a better option
would be to not use synonyms in the analyzer but to add a separate document
for every synonym.

On a side note, since you are doing auto-completion, maybe you could have
a look at the completion suggester[1]. Although it doesn't support
highlighting, I would expect it to be an order of magnitude faster than
index-based autocompletion so this might be worth checking out.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

On Tue, Dec 17, 2013 at 6:03 AM, paul avinashpaul85@gmail.com wrote:

Sure Adrien below is my definitions ,

"filter":{
"syns_filter":{
"synonyms_path":"synonyms/synonym_collegename.txt",
"type":"synonym",
"ignore_case":true
},
"my_edgeNgram":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":10
}
}
}

Regards
Paul

On Monday, 16 December 2013 15:30:47 UTC+5:30, Adrien Grand wrote:

Hi Paul,

Can you also paste your definitions of "syns_filter" and "my_edgeNgram"?

On Mon, Dec 16, 2013 at 7:28 AM, paul avinas...@gmail.com wrote:

I am trying out Highlighter feature of elastic-search. the text marked
in yellow is expected but why did it match the text marked in green ?

elastic-search = 0.90.0
java = 1.7

analyzer on that filed is "autocomplete" below is the configuration

"autocomplete":{
"type":"custom",
"tokenizer":"standard",
"filter":[
"standard",
"lowercase",
"syns_filter",
"my_edgeNgram"
]
},

My query:
{
"fields": [
"name"
],
"query": {
"match": {
"name": "univ"
}
},
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"name": {}
}
}
}

Results:

{
fields:{
name:SUNY Binghamton University
} highlight:{
name:[
SUNY Binghamton University
]
}
}

{
fields:{
name:Arizona State University
} highlight:{
name:[
Arizona State University
]
}
}

{
fields:{
name:Ohio State University
} highlight:{
name:[
Ohio State University
]
}
}

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5d4a683f-a208-4a17-bce1-0248f43dbcd6%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ab57905-1533-4c4a-81e3-b370d0dced7e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/_AGDR-z6glM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7L3r%2BSYQV8YPJa0DRDShBR8cd8u687K0HD%3DYmtg981YQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO066G3LzSu_Avt4fg731Yunm-fqzhEhyUbYHPVDPj8NrUfYSw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6