Fast vector highlighter does not work with explicit span_near queries


(Harry Waye-2) #1

I'm trying to use fvh with span_near queries but it appears to be totally
broken. Other query types work, even it's query_string equivalent. Is
there anything I am doing incorrectly here? Or is there a work around that
I can employ in the meantime? Below is a recreation:

Set up index with mappings

curl -XPOST localhost:9200/a -d '{
"mappings": {
"document": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}'

Put text to field with positions offsets

curl -XPOST localhost:9200/a/document/1 -d '{"text": "a b"}'

Query with fvh highlighter gives no highlight

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195, "_source" : {"text": "a b"}}]}}

Query with plain

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
}
},
"highlight": {"fields": {"text": {"type":"plain"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195, "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"query_string": {
"query": ""a b"~0",
"default_field": "text"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.38356602,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.38356602, "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

Try a match query

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"match": {
"text": "a b"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.2712221,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.2712221, "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bc11d0c7-119b-410d-9fb4-ee4c72c6ee5a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Harry Waye-2) #2

FYI this is ES 1.0.1

On Friday, March 21, 2014 1:00:33 PM UTC, Harry Waye wrote:

I'm trying to use fvh with span_near queries but it appears to be totally
broken. Other query types work, even it's query_string equivalent. Is
there anything I am doing incorrectly here? Or is there a work around that
I can employ in the meantime? Below is a recreation:

Set up index with mappings

curl -XPOST localhost:9200/a -d '{
"mappings": {
"document": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}'

Put text to field with positions offsets

curl -XPOST localhost:9200/a/document/1 -d '{"text": "a b"}'

Query with fvh highlighter gives no highlight

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195, "_source" : {"text": "a b"}}]}}

Query with plain

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text": "b"}}]
}
},
"highlight": {"fields": {"text": {"type":"plain"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195, "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"query_string": {
"query": ""a b"~0",
"default_field": "text"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.38356602,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.38356602, "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

Try a match query

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"match": {
"text": "a b"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.2712221,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.2712221, "_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b86427d-1033-493b-a874-0411f3b77ec4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Robert Muir-2) #3

FVH definitely doesn't recognize span-near queries. In general, when
it comes to the spanquery family the plain highlighter will work
better because it has explicit support for those queries.

Maybe you want to open an issue? Its not obvious how to fix though,
because of how span-near queries can arbitrarily nest, yet fvh's
design needs to "flatten" the query to a simple list of queries and
phrases.

In the meantime, I'd recommend the plain highlighter.

On Fri, Mar 21, 2014 at 9:01 AM, Harry Waye harry@arachnys.com wrote:

FYI this is ES 1.0.1

On Friday, March 21, 2014 1:00:33 PM UTC, Harry Waye wrote:

I'm trying to use fvh with span_near queries but it appears to be totally
broken. Other query types work, even it's query_string equivalent. Is
there anything I am doing incorrectly here? Or is there a work around that
I can employ in the meantime? Below is a recreation:

Set up index with mappings

curl -XPOST localhost:9200/a -d '{
"mappings": {
"document": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}'

Put text to field with positions offsets

curl -XPOST localhost:9200/a/document/1 -d '{"text": "a b"}'

Query with fvh highlighter gives no highlight

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text":
"b"}}]
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,
"_source" : {"text": "a b"}}]}}

Query with plain

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text":
"b"}}]
}
},
"highlight": {"fields": {"text": {"type":"plain"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,
"_source" : {"text": "a b"},"highlight":{"text":["a
b"]}}]}}

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"query_string": {
"query": ""a b"~0",
"default_field": "text"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.38356602,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.38356602,
"_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

Try a match query

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"match": {
"text": "a b"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.2712221,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.2712221,
"_source" : {"text": "a b"},"highlight":{"text":["a
b"]}}]}}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0b86427d-1033-493b-a874-0411f3b77ec4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZXJQT%3DHvPON4hTY_QccQh2bxCDXabcAFHOc6wBaHFUHwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Harry Waye-2) #4

Thanks Robert, a useful caveat to add to the highlighter docs?

The reason I looked in to using the fvh was due to the plain highlighter
not using the correct analyzer if the index analyzer was specified
specified via a path in the indexed document (_analyzer: {path: ...}).
Looks like an easy fix though.

I'll add issues for both.

On Saturday, March 22, 2014 1:56:36 PM UTC, Robert Muir wrote:

FVH definitely doesn't recognize span-near queries. In general, when
it comes to the spanquery family the plain highlighter will work
better because it has explicit support for those queries.

Maybe you want to open an issue? Its not obvious how to fix though,
because of how span-near queries can arbitrarily nest, yet fvh's
design needs to "flatten" the query to a simple list of queries and
phrases.

In the meantime, I'd recommend the plain highlighter.

On Fri, Mar 21, 2014 at 9:01 AM, Harry Waye <ha...@arachnys.com<javascript:>>
wrote:

FYI this is ES 1.0.1

On Friday, March 21, 2014 1:00:33 PM UTC, Harry Waye wrote:

I'm trying to use fvh with span_near queries but it appears to be
totally

broken. Other query types work, even it's query_string equivalent. Is
there anything I am doing incorrectly here? Or is there a work around
that

I can employ in the meantime? Below is a recreation:

Set up index with mappings

curl -XPOST localhost:9200/a -d '{
"mappings": {
"document": {
"properties": {
"text": {
"type": "string",
"term_vector": "with_positions_offsets"
}
}
}
}
}'

Put text to field with positions offsets

curl -XPOST localhost:9200/a/document/1 -d '{"text": "a b"}'

Query with fvh highlighter gives no highlight

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text":
"b"}}]
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,

"_source" : {"text": "a b"}}]}}

Query with plain

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"span_near": {
"slop": 0,
"clauses": [{"span_term": {"text": "a"}}, {"span_term": {"text":
"b"}}]
}
},
"highlight": {"fields": {"text": {"type":"plain"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.22145195,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.22145195,

"_source" : {"text": "a b"},"highlight":{"text":["a
b"]}}]}}

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"query_string": {
"query": ""a b"~0",
"default_field": "text"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":3,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.38356602,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.38356602,

"_source" : {"text": "a b"},"highlight":{"text":["a b"]}}]}}

Try a match query

curl -XPOST localhost:9200/a/document/_search -d '{
"query": {
"match": {
"text": "a b"
}
},
"highlight": {"fields": {"text": {"type":"fvh"}}}
}'

{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.2712221,"hits":[{"_index":"a","_type":"document","_id":"1","_score":0.2712221,

"_source" : {"text": "a b"},"highlight":{"text":["a
b"]}}]}}

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/0b86427d-1033-493b-a874-0411f3b77ec4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/18a197d5-6c04-4b2e-a230-fe0f66fc33e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Robert Muir-2) #5

On Sat, Mar 22, 2014 at 10:26 AM, Harry Waye harry@arachnys.com wrote:

Thanks Robert, a useful caveat to add to the highlighter docs?

Yes, I think so.

The reason I looked in to using the fvh was due to the plain highlighter not
using the correct analyzer if the index analyzer was specified specified via
a path in the indexed document (_analyzer: {path: ...}). Looks like an easy
fix though.

I'll add issues for both.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZVJjvinZ8-bLNU%2B1T07LQ_u3HXidg0UFB2vCf954E5_og%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Harry Waye-2) #6

On a related note, given that I've now enabled postitions offsets on a
field that will no longer be useful, is there some way of removing these
without reindexing everything. I'm happy to perform some low level lucene
surgery but unsure what would be required i.e. how do I update the lucene
indices and how do I update the mappings? Reindexing can take some time so
I'd like to avoid if possible.

On Saturday, March 22, 2014 2:43:51 PM UTC, Robert Muir wrote:

On Sat, Mar 22, 2014 at 10:26 AM, Harry Waye <ha...@arachnys.com<javascript:>>
wrote:

Thanks Robert, a useful caveat to add to the highlighter docs?

Yes, I think so.

The reason I looked in to using the fvh was due to the plain highlighter
not
using the correct analyzer if the index analyzer was specified specified
via
a path in the indexed document (_analyzer: {path: ...}). Looks like an
easy
fix though.

I'll add issues for both.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/030153a5-a0a9-449a-adb7-54ce6b2e48e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7