Can't Get Highlighting Working


(timscott) #1

I am trying to get highlighting working without luck. I'm still pretty much an ES noob.

curl -XGET http://10.0.1.2:9200/tscott/_search -d'
{
"query":{
"field":{
"Summary":{
"query":""The quick brown fox""
}
}
},
"highlight":{
"fields":{
"_all":{ }
}
}
}'

Which returns the one hit but with no highlighting:

"highlight" : { "_all" : null }

When I check the mapping for the type, I see that the Summary field is mapped thus:

"Summary" : {
"term_vector" : "with_positions_offsets",
"type" : "string"
}

I'm sure I'm missing something simple.


(Lukáš Vlček) #2

Hi,

you are highlighting on _all field (not on Summary field) so you need to
check mapping of _all field (is it stored?).
Also it would be good if you can post version of ES being used because there
have been some enhancements/new features implemented for highlighting in
recently released version 0.14 (and in master as well).

Regards,
Lukas

On Mon, Jan 3, 2011 at 6:11 PM, timscott tscott@lunaversesoftware.comwrote:

I am trying to get highlighting working without luck. I'm still pretty
much
an ES noob.

curl -XGET http://10.0.1.2:9200/tscott/_search -d'
{
"query":{
"field":{
"Summary":{
"query":""The quick brown fox""
}
}
},
"highlight":{
"fields":{
"_all":{ }
}
}
}'

Which returns the one hit but with no highlighting:

"highlight" : { "_all" : null }

When I check the mapping for the type, I see that the Summary field is
mapped thus:

"Summary" : {
"term_vector" : "with_positions_offsets",
"type" : "string"
}

I'm sure I'm missing something simple.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2186155.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(timscott) #3

Thanks for the reply. I am using version 0.14.1 (the latest binary
download). I guess I don't understand what "_all" fields means. I
thought it was shorthand for listing every field individually.

In any case when I put "Summary" instead of "_all" I get highlight
results as expected. That should do it for me.

Thanks again!

On Jan 3, 12:16 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

you are highlighting on _all field (not on Summary field) so you need to
check mapping of _all field (is it stored?).
Also it would be good if you can post version of ES being used because there
have been some enhancements/new features implemented for highlighting in
recently released version 0.14 (and in master as well).

Regards,
Lukas

On Mon, Jan 3, 2011 at 6:11 PM, timscott tsc...@lunaversesoftware.comwrote:

I am trying to get highlighting working without luck. I'm still pretty
much
an ES noob.

curl -XGEThttp://10.0.1.2:9200/tscott/_search-d'
{
"query":{
"field":{
"Summary":{
"query":""The quick brown fox""
}
}
},
"highlight":{
"fields":{
"_all":{ }
}
}
}'

Which returns the one hit but with no highlighting:

"highlight" : { "_all" : null }

When I check the mapping for the type, I see that the Summary field is
mapped thus:

"Summary" : {
"term_vector" : "with_positions_offsets",
"type" : "string"
}

I'm sure I'm missing something simple.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Lukáš Vlček) #4

Hi,

On Mon, Jan 3, 2011 at 10:57 PM, Tim Scott tscott@lunaversesoftware.comwrote:

Thanks for the reply. I am using version 0.14.1 (the latest binary
download). I guess I don't understand what "_all" fields means. I
thought it was shorthand for listing every field individually.

Most of the time it can be helpful for search but it is not a good candidate
for highlighting IMHO. When you search in Lucene (which ElasticSearch is
built on top of) you need to specify which document field(s) to query
against. At Lucene low level it is necessary to specify document field name
for queries. _all field concept is used to help in situations when the field
name is not known in advance at query time (it gives you ability to search
for documents that have given term in any of its filed). You can read more
about _all field here:
http://www.elasticsearch.com/docs/elasticsearch/mapping/all_field/

In any case when I put "Summary" instead of "_all" I get highlight
results as expected. That should do it for me.

Just note that it was required for the field to be stored to allow
highlighting but as of 0.14 if the field is not set as stored its content is
extracted directly from the _source field for the highlighting. But _all
field is not part of the _source (and if I am not mistaken it is not stored
by default as well) so you can not use it for highlighting without
explicitly setting it stored.

Thanks again!

On Jan 3, 12:16 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

you are highlighting on _all field (not on Summary field) so you need to
check mapping of _all field (is it stored?).
Also it would be good if you can post version of ES being used because
there
have been some enhancements/new features implemented for highlighting in
recently released version 0.14 (and in master as well).

Regards,
Lukas

On Mon, Jan 3, 2011 at 6:11 PM, timscott <tsc...@lunaversesoftware.com
wrote:

I am trying to get highlighting working without luck. I'm still pretty
much
an ES noob.

curl -XGEThttp://10.0.1.2:9200/tscott/_search-d'
{
"query":{
"field":{
"Summary":{
"query":""The quick brown fox""
}
}
},
"highlight":{
"fields":{
"_all":{ }
}
}
}'

Which returns the one hit but with no highlighting:

"highlight" : { "_all" : null }

When I check the mapping for the type, I see that the Summary field is
mapped thus:

"Summary" : {
"term_vector" : "with_positions_offsets",
"type" : "string"
}

I'm sure I'm missing something simple.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin.
..

Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(timscott) #5

I got highlighting to work...well sorta. The highlighting is frequently shifted. For example, when I search for "balance" many of the highlights are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4 characters. The shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?


(timscott) #6

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is queried,
I expect some highlighting for every hit. However, only a minority of
hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains fragments
found nowhere in document A but instead in document B. Document B is
also a hit, which may or may not have any highlighting, and if it does
it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is frequently
shifted. For example, when I search for "balance" many of the highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4 characters. The
shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Lukáš Vlček) #7

Hi,

do you think you are able to recreate this with curl examples?
Also can you try set store to yes in mappings for highlighted fields not to
get the content from the _source and see if this makes any change?

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:18 PM, Tim Scott tscott@lunaversesoftware.comwrote:

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is queried,
I expect some highlighting for every hit. However, only a minority of
hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains fragments
found nowhere in document A but instead in document B. Document B is
also a hit, which may or may not have any highlighting, and if it does
it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is frequently
shifted. For example, when I search for "balance" many of the highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4 characters.
The
shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(timscott) #8

I'm just ahead of you. I tried setting Content and Summary fields to store="yes". It made no difference.

It will take a good bit of effort to replicate with a series of curl statements, but I guess that's the only next step. I'll try to do that in the next day or two. Thanks for your help Lukáš.

Hi,

do you think you are able to recreate this with curl examples?
Also can you try set store to yes in mappings for highlighted fields not to
get the content from the _source and see if this makes any change?

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:18 PM, Tim Scott tscott@lunaversesoftware.comwrote:

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is queried,
I expect some highlighting for every hit. However, only a minority of
hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains fragments
found nowhere in document A but instead in document B. Document B is
also a hit, which may or may not have any highlighting, and if it does
it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is frequently
shifted. For example, when I search for "balance" many of the highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4 characters.
The
shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin...
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Lukáš Vlček) #9

If curl replication sounds like a big undertake then you can try some
earlier ES releases (http://www.elasticsearch.com/download/) and see if the
issue starts with some particular version.

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:51 PM, timscott tscott@lunaversesoftware.comwrote:

I'm just ahead of you. I tried setting Content and Summary fields to
store="yes". It made no difference.

It will take a good bit of effort to replicate with a series of curl
statements, but I guess that's the only next step. I'll try to do that in
the next day or two. Thanks for your help Lukáš.

Lukáš Vlček wrote:

Hi,

do you think you are able to recreate this with curl examples?
Also can you try set store to yes in mappings for highlighted fields not
to
get the content from the _source and see if this makes any change?

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:18 PM, Tim Scott
tscott@lunaversesoftware.comwrote:

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is queried,
I expect some highlighting for every hit. However, only a minority of
hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains fragments
found nowhere in document A but instead in document B. Document B is
also a hit, which may or may not have any highlighting, and if it does
it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is
frequently
shifted. For example, when I search for "balance" many of the
highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4
characters.

The

shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin.
..

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2194542.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(mbro) #10

Hi,

I recently discovered the same behavior that Tim is reporting, but only when
I was using a query_string search that contains a wildcard (e.g.,
"*someword"). In those cases, the same query issued multiple times would
always return the same list of hits, but the highlighting results wouldn't
be consistent. Some hits wouldn't have any highlights returned. And it was
always a different set of hits missing the highlighting.

I'm a new ES user just getting started with 0.14 so I can't say when this
started happening.

I'll see if I can recreate it with some curls.

Mike

On Tue, Jan 4, 2011 at 5:00 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

If curl replication sounds like a big undertake then you can try some
earlier ES releases (http://www.elasticsearch.com/download/) and see if
the issue starts with some particular version.

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:51 PM, timscott tscott@lunaversesoftware.comwrote:

I'm just ahead of you. I tried setting Content and Summary fields to
store="yes". It made no difference.

It will take a good bit of effort to replicate with a series of curl
statements, but I guess that's the only next step. I'll try to do that in
the next day or two. Thanks for your help Lukáš.

Lukáš Vlček wrote:

Hi,

do you think you are able to recreate this with curl examples?
Also can you try set store to yes in mappings for highlighted fields not
to
get the content from the _source and see if this makes any change?

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:18 PM, Tim Scott
tscott@lunaversesoftware.comwrote:

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is queried,
I expect some highlighting for every hit. However, only a minority of
hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains fragments
found nowhere in document A but instead in document B. Document B is
also a hit, which may or may not have any highlighting, and if it does
it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is
frequently
shifted. For example, when I search for "balance" many of the
highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4
characters.

The

shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin.
..

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2194542.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(timscott) #11

I tried to replicate via curl with naive data. No luck. My real data is sensitive so I spent some time to redact a few samples. However, the contents are quite large and have many different non-alphanumeric characters throughout. These characters give bash fits. I spent an hour or so trying to clean it (adding escapes), but I felt like I was in quicksand. I'm giving up for now. Highlighting is an important feature for my app, but I have far exceeded a reasonable time-box to make it work, so I'm moving on for now. I'll keep checking back as new versions come out.

There's one thing that I wonder. Could the problem be related to all the non-alphanumeric characters in my docs? Somehow these muck up the indexing?

If curl replication sounds like a big undertake then you can try some earlier ES releases (http://www.elasticsearch.com/download/) and see if the issue starts with some particular version.

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:51 PM, timscott tscott@lunaversesoftware.comwrote:

I'm just ahead of you. I tried setting Content and Summary fields to
store="yes". It made no difference.

It will take a good bit of effort to replicate with a series of curl
statements, but I guess that's the only next step. I'll try to do that in
the next day or two. Thanks for your help Lukáš.

Lukáš Vlček wrote:

Hi,

do you think you are able to recreate this with curl examples?
Also can you try set store to yes in mappings for highlighted fields not
to
get the content from the _source and see if this makes any change?

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:18 PM, Tim Scott
tscott@lunaversesoftware.comwrote:

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is queried,
I expect some highlighting for every hit. However, only a minority of
hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains fragments
found nowhere in document A but instead in document B. Document B is
also a hit, which may or may not have any highlighting, and if it does
it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is
frequently
shifted. For example, when I search for "balance" many of the
highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4
characters.

The

shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin.
..

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2194542.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(mbro) #12

I was able to put together a small script and Java program (see attached
zip) that demonstrates what I'm seeing.

Run the load-highlight-data bash script to put documents into the index.
Then run test-highlight-rest to do REST API queries via curl. There are
some comments in the file regarding query_string patterns that I used.
There is also a small Java project included that you can run. Just drop the
elasticsearch-0.14.0.jar into the lib directory and it should compile/run.

The REST API consistently returns the same results, although they are wrong
for some queries.
The Java API returns inconsistent results.

Both APIs alway return the correct list of hits...but sometimes the
highlight information returned with the hits is incorrect.

Hope this helps...

Mike

On Tue, Jan 4, 2011 at 6:42 PM, timscott tscott@lunaversesoftware.comwrote:

I tried to replicate via curl with naive data. No luck. My real data is
sensitive so I spent some time to redact a few samples. However, the
contents are quite large and have many different non-alphanumeric
characters
throughout. These characters give bash fits. I spent an hour or so trying
to clean it (adding escapes), but I felt like I was in quicksand. I'm
giving up for now. Highlighting is an important feature for my app, but I
have far exceeded a reasonable time-box to make it work, so I'm moving on
for now. I'll keep checking back as new versions come out.

There's one thing that I wonder. Could the problem be related to all the
non-alphanumeric characters in my docs? Somehow these muck up the
indexing?

Lukáš Vlček wrote:

If curl replication sounds like a big undertake then you can try some
earlier ES releases (http://www.elasticsearch.com/download/) and see if
the
issue starts with some particular version.

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:51 PM, timscott
tscott@lunaversesoftware.comwrote:

I'm just ahead of you. I tried setting Content and Summary fields to
store="yes". It made no difference.

It will take a good bit of effort to replicate with a series of curl
statements, but I guess that's the only next step. I'll try to do that
in
the next day or two. Thanks for your help Lukáš.

Lukáš Vlček wrote:

Hi,

do you think you are able to recreate this with curl examples?
Also can you try set store to yes in mappings for highlighted fields
not
to
get the content from the _source and see if this makes any change?

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:18 PM, Tim Scott
tscott@lunaversesoftware.comwrote:

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is queried,
I expect some highlighting for every hit. However, only a minority
of

hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains
fragments

found nowhere in document A but instead in document B. Document B is
also a hit, which may or may not have any highlighting, and if it
does

it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is
frequently
shifted. For example, when I search for "balance" many of the
highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4
characters.

The

shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin.

..

Sent from the ElasticSearch Users mailing list archive at
Nabble.com.

--
View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2194542.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2195175.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Shay Banon) #13

I've found the problem with highlighting the wrong content (from another
doc) opened an issue for it and pushed a fix:
https://github.com/elasticsearch/elasticsearch/issues/closed#issue/600.

Will release 0.14.2 soonish to include this fix and other bug fixes.

On Wed, Jan 5, 2011 at 7:34 AM, Mike Brocious mike.brocious@gmail.comwrote:

I was able to put together a small script and Java program (see attached
zip) that demonstrates what I'm seeing.

Run the load-highlight-data bash script to put documents into the index.
Then run test-highlight-rest to do REST API queries via curl. There are
some comments in the file regarding query_string patterns that I used.
There is also a small Java project included that you can run. Just drop
the elasticsearch-0.14.0.jar into the lib directory and it should
compile/run.

The REST API consistently returns the same results, although they are wrong
for some queries.
The Java API returns inconsistent results.

Both APIs alway return the correct list of hits...but sometimes the
highlight information returned with the hits is incorrect.

Hope this helps...

Mike

On Tue, Jan 4, 2011 at 6:42 PM, timscott tscott@lunaversesoftware.comwrote:

I tried to replicate via curl with naive data. No luck. My real data is
sensitive so I spent some time to redact a few samples. However, the
contents are quite large and have many different non-alphanumeric
characters
throughout. These characters give bash fits. I spent an hour or so
trying
to clean it (adding escapes), but I felt like I was in quicksand. I'm
giving up for now. Highlighting is an important feature for my app, but I
have far exceeded a reasonable time-box to make it work, so I'm moving on
for now. I'll keep checking back as new versions come out.

There's one thing that I wonder. Could the problem be related to all the
non-alphanumeric characters in my docs? Somehow these muck up the
indexing?

Lukáš Vlček wrote:

If curl replication sounds like a big undertake then you can try some
earlier ES releases (http://www.elasticsearch.com/download/) and see if
the
issue starts with some particular version.

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:51 PM, timscott
tscott@lunaversesoftware.comwrote:

I'm just ahead of you. I tried setting Content and Summary fields to
store="yes". It made no difference.

It will take a good bit of effort to replicate with a series of curl
statements, but I guess that's the only next step. I'll try to do that
in
the next day or two. Thanks for your help Lukáš.

Lukáš Vlček wrote:

Hi,

do you think you are able to recreate this with curl examples?
Also can you try set store to yes in mappings for highlighted fields
not
to
get the content from the _source and see if this makes any change?

Regards,
Lukas

On Tue, Jan 4, 2011 at 10:18 PM, Tim Scott
tscott@lunaversesoftware.comwrote:

I re-indexed everything, and the shifting problem went away. Now I
have a much worse problem. In my query:

"highlight":{"fields":{"Summary":{},"Content":{"order":"score"}}}

Because I am requesting highlighting for every field that is
queried,

I expect some highlighting for every hit. However, only a minority
of

hits have any. Most hits come back like this:

"highlight" : {"Content" : [ ], "Summary" : null}

But it gets much weirder. For those hits that do have highlighting,
some are okay, but others contain fragments from the wrong document.
That is, the highlighting for the hit on document A contains
fragments

found nowhere in document A but instead in document B. Document B
is

also a hit, which may or may not have any highlighting, and if it
does

it may or may not be the correct highlighting.

I re-mapped and re-indexed everything several times with the same
results each time. The relevant part of the mapping is:

"Content":{"term_vector":"with_positions_offsets", "type":"string"},
"Summary":{"term_vector":"with_positions_offsets","type":"string"}

I can't see what could be wrong, but obviously something is bad
wrong. Ideas?

On Jan 3, 6:42 pm, timscott tsc...@lunaversesoftware.com wrote:

I got highlighting to work...well sorta. The highlighting is
frequently
shifted. For example, when I search for "balance" many of the
highlights
are like this:

... Average ledger balance this period ...

As you can see the highlighting is shifted to the left by 4
characters.

The

shift seems to vary from 0 to the length of the term.

Did something go bad at indexing time? Ideas?

View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlightin.

..

Sent from the ElasticSearch Users mailing list archive at
Nabble.com.

--
View this message in context:

http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2194542.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2195175.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(mbro) #14

Shay,

I just downloaded and tried 0.14.2, but it does not fix the inconsistent highlighting results that I'm seeing. The attachment in my earlier post in this thread can be used to demonstrate the problem. Just run the load-highlight-data script to post data to ES, then run the Java program a couple of times and you should see different highlight results.

I'll be happy to open an issue for this if you'd like.

Thanks for your efforts on ES...it's awesome.

Regards,
Mike


(mbro) #15

Shay,

I just downloaded and tried 0.14.2, but it does not fix the inconsistent highlighting results that I'm seeing. The attachment in my earlier post in this thread can be used to demonstrate the problem. Just run the load-highlight-data script to post data to ES, then run the Java program a couple of times and you should see different highlight results.

I'll be happy to open an issue for this if you'd like.

Thanks for your efforts on ES...it's awesome.

Regards,
Mike


(Shay Banon) #16

Well, that one was a nasty one to track down... . Its not the Java transport
client compared to the REST one, its just that its a game of statistics (I
was getting it once out of about 10000 runs). Pushed the fix to master and
the 0.14 branch if you want to test.

On Thu, Jan 6, 2011 at 6:09 PM, mbrocious mike.brocious@gmail.com wrote:

Shay,

I just downloaded and tried 0.14.2, but it does not fix the inconsistent
highlighting results that I'm seeing. The attachment in my earlier post in
this thread can be used to demonstrate the problem. Just run the
load-highlight-data script to post data to ES, then run the Java program a
couple of times and you should see different highlight results.

I'll be happy to open an issue for this if you'd like.

Thanks for your efforts on ES...it's awesome.

Regards,
Mike

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2206391.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(mbro) #17

Yup, that fixed it! Thanks.

Mike

BTW, sorry for the double post of my last message....the first one was
'pending' for about a day and I hadn't seen that happen before so I posted
again.

On Fri, Jan 7, 2011 at 8:21 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Well, that one was a nasty one to track down... . Its not the Java
transport client compared to the REST one, its just that its a game of
statistics (I was getting it once out of about 10000 runs). Pushed the fix
to master and the 0.14 branch if you want to test.

On Thu, Jan 6, 2011 at 6:09 PM, mbrocious mike.brocious@gmail.com wrote:

Shay,

I just downloaded and tried 0.14.2, but it does not fix the inconsistent
highlighting results that I'm seeing. The attachment in my earlier post
in
this thread can be used to demonstrate the problem. Just run the
load-highlight-data script to post data to ES, then run the Java program a
couple of times and you should see different highlight results.

I'll be happy to open an issue for this if you'd like.

Thanks for your efforts on ES...it's awesome.

Regards,
Mike

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Can-t-Get-Highlighting-Working-tp2186155p2206391.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #18