Phrase suggester


(Nik Everett) #1

Is it possible to convince the phrase suggester to return the suggested
phrase in such a way that I can highlight the changed words easily? I can
certainly split both phrases on spaces and compare them but that really
isn't going to work for non-space-delimited languages.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #2

Nick, there could be multiple changes in the phrase that is returned. so
it's kind of unclear what you are asking for in particular. is it a list of
offsets that have been changed? I'd be happy to improve the feature but I
wonder how we can represent what you need. maybe you have an example?

simon

On Wednesday, July 31, 2013 6:05:58 PM UTC+2, Nikolas Everett wrote:

Is it possible to convince the phrase suggester to return the suggested
phrase in such a way that I can highlight the changed words easily? I can
certainly split both phrases on spaces and compare them but that really
isn't going to work for non-space-delimited languages.

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #3

I'm happy to provide an example! This is pretty much what I'm trying to
implement:


See how "god" and "jewel" are highlighted? While it'd be sweet if I could
ask elasticsearch to do the actual highlighting that really isn't
required. All I really need is a way to get a the changed tokens.
Something like this:
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": {
"got": "god",
"jewl": "jewel"
}
}
]
}

or this
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": [
{"was": "got", "is": "god", "from": 7, "to" 10},
{"was": "jewl", "is": "jewel", "from": 11, "to" 14},
]
}
]
}

One thing, though: it wouldn't be useful for me to see any shingle tokens
because users don't really think of them as words. It is almost like you'd
need to retokenize the text and the suggestion without the shingle filter
and compare them.

I was looking around the code but all the spares and byteSpares are making
my head spin at the moment.

Nik

On Fri, Aug 2, 2013 at 4:10 PM, simonw simon.willnauer@elasticsearch.comwrote:

Nick, there could be multiple changes in the phrase that is returned. so
it's kind of unclear what you are asking for in particular. is it a list of
offsets that have been changed? I'd be happy to improve the feature but I
wonder how we can represent what you need. maybe you have an example?

simon

On Wednesday, July 31, 2013 6:05:58 PM UTC+2, Nikolas Everett wrote:

Is it possible to convince the phrase suggester to return the suggested
phrase in such a way that I can highlight the changed words easily? I can
certainly split both phrases on spaces and compare them but that really
isn't going to work for non-space-delimited languages.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #4

simon,

I put together a (probably wrong) implementation of this in
https://github.com/elasticsearch/elasticsearch/pull/3443 . I'd love it if
you could have a look at it at some point. It still needs a lot of work
but would certainly work for me.

Nik

On Fri, Aug 2, 2013 at 5:15 PM, Nikolas Everett nik9000@gmail.com wrote:

I'm happy to provide an example! This is pretty much what I'm trying to
implement:
http://en.wikipedia.org/w/index.php?search=xorr+the+gott-jewl
See how "god" and "jewel" are highlighted? While it'd be sweet if I could
ask elasticsearch to do the actual highlighting that really isn't
required. All I really need is a way to get a the changed tokens.
Something like this:
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": {
"got": "god",
"jewl": "jewel"
}
}
]
}

or this
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": [
{"was": "got", "is": "god", "from": 7, "to" 10},
{"was": "jewl", "is": "jewel", "from": 11, "to" 14},
]
}
]
}

One thing, though: it wouldn't be useful for me to see any shingle tokens
because users don't really think of them as words. It is almost like you'd
need to retokenize the text and the suggestion without the shingle filter
and compare them.

I was looking around the code but all the spares and byteSpares are making
my head spin at the moment.

Nik

On Fri, Aug 2, 2013 at 4:10 PM, simonw simon.willnauer@elasticsearch.comwrote:

Nick, there could be multiple changes in the phrase that is returned. so
it's kind of unclear what you are asking for in particular. is it a list of
offsets that have been changed? I'd be happy to improve the feature but I
wonder how we can represent what you need. maybe you have an example?

simon

On Wednesday, July 31, 2013 6:05:58 PM UTC+2, Nikolas Everett wrote:

Is it possible to convince the phrase suggester to return the suggested
phrase in such a way that I can highlight the changed words easily? I can
certainly split both phrases on spaces and compare them but that really
isn't going to work for non-space-delimited languages.

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(simonw-2) #5

for the record, this feature is now in master

simon

On Monday, August 5, 2013 5:05:30 PM UTC+2, Nikolas Everett wrote:

simon,

I put together a (probably wrong) implementation of this in
https://github.com/elasticsearch/elasticsearch/pull/3443 . I'd love it
if you could have a look at it at some point. It still needs a lot of work
but would certainly work for me.

Nik

On Fri, Aug 2, 2013 at 5:15 PM, Nikolas Everett <nik...@gmail.com<javascript:>

wrote:

I'm happy to provide an example! This is pretty much what I'm trying to
implement:
http://en.wikipedia.org/w/index.php?search=xorr+the+gott-jewl
See how "god" and "jewel" are highlighted? While it'd be sweet if I
could ask elasticsearch to do the actual highlighting that really isn't
required. All I really need is a way to get a the changed tokens.
Something like this:
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": {
"got": "god",
"jewl": "jewel"
}
}
]
}

or this
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": [
{"was": "got", "is": "god", "from": 7, "to" 10},
{"was": "jewl", "is": "jewel", "from": 11, "to" 14},
]
}
]
}

One thing, though: it wouldn't be useful for me to see any shingle tokens
because users don't really think of them as words. It is almost like you'd
need to retokenize the text and the suggestion without the shingle filter
and compare them.

I was looking around the code but all the spares and byteSpares are
making my head spin at the moment.

Nik

On Fri, Aug 2, 2013 at 4:10 PM, simonw <simon.w...@elasticsearch.com<javascript:>

wrote:

Nick, there could be multiple changes in the phrase that is returned. so
it's kind of unclear what you are asking for in particular. is it a list of
offsets that have been changed? I'd be happy to improve the feature but I
wonder how we can represent what you need. maybe you have an example?

simon

On Wednesday, July 31, 2013 6:05:58 PM UTC+2, Nikolas Everett wrote:

Is it possible to convince the phrase suggester to return the suggested
phrase in such a way that I can highlight the changed words easily? I can
certainly split both phrases on spaces and compare them but that really
isn't going to work for non-space-delimited languages.

Nik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Amit Soni) #6

Simon - Can you pl share which elastic search API specifically has to be
used to try this out (as explained in the above example)?

-Amit.

On Tue, Aug 6, 2013 at 12:20 PM, simonw
simon.willnauer@elasticsearch.comwrote:

for the record, this feature is now in master

simon

On Monday, August 5, 2013 5:05:30 PM UTC+2, Nikolas Everett wrote:

simon,

I put together a (probably wrong) implementation of this in
https://github.com/**elasticsearch/elasticsearch/**pull/3443https://github.com/elasticsearch/elasticsearch/pull/3443. I'd love it if you could have a look at it at some point. It still
needs a lot of work but would certainly work for me.

Nik

On Fri, Aug 2, 2013 at 5:15 PM, Nikolas Everett nik...@gmail.com wrote:

I'm happy to provide an example! This is pretty much what I'm trying to
implement:
http://en.wikipedia.org/w/**index.php?search=xorr+the+**gott-jewlhttp://en.wikipedia.org/w/index.php?search=xorr+the+gott-jewl
See how "god" and "jewel" are highlighted? While it'd be sweet if I
could ask elasticsearch to do the actual highlighting that really isn't
required. All I really need is a way to get a the changed tokens.
Something like this:
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": {
"got": "god",
"jewl": "jewel"
}
}
]
}

or this
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": [
{"was": "got", "is": "god", "from": 7, "to" 10},
{"was": "jewl", "is": "jewel", "from": 11, "to"
14},
]
}
]
}

One thing, though: it wouldn't be useful for me to see any shingle
tokens because users don't really think of them as words. It is almost
like you'd need to retokenize the text and the suggestion without the
shingle filter and compare them.

I was looking around the code but all the spares and byteSpares are
making my head spin at the moment.

Nik

On Fri, Aug 2, 2013 at 4:10 PM, simonw <simon.w...@**elasticsearch.com>wrote:

Nick, there could be multiple changes in the phrase that is returned.
so it's kind of unclear what you are asking for in particular. is it a list
of offsets that have been changed? I'd be happy to improve the feature but
I wonder how we can represent what you need. maybe you have an example?

simon

On Wednesday, July 31, 2013 6:05:58 PM UTC+2, Nikolas Everett wrote:

Is it possible to convince the phrase suggester to return the
suggested phrase in such a way that I can highlight the changed words
easily? I can certainly split both phrases on spaces and compare them but
that really isn't going to work for non-space-delimited languages.

Nik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #7

Look on
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.htmlfor
the term "
highlight". The big example has it here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html#_api_example

On Wed, Oct 30, 2013 at 3:47 AM, Amit Soni amitsoni29@gmail.com wrote:

Simon - Can you pl share which elastic search API specifically has to be
used to try this out (as explained in the above example)?

-Amit.

On Tue, Aug 6, 2013 at 12:20 PM, simonw <simon.willnauer@elasticsearch.com

wrote:

for the record, this feature is now in master

simon

On Monday, August 5, 2013 5:05:30 PM UTC+2, Nikolas Everett wrote:

simon,

I put together a (probably wrong) implementation of this in
https://github.com/**elasticsearch/elasticsearch/**pull/3443https://github.com/elasticsearch/elasticsearch/pull/3443. I'd love it if you could have a look at it at some point. It still
needs a lot of work but would certainly work for me.

Nik

On Fri, Aug 2, 2013 at 5:15 PM, Nikolas Everett nik...@gmail.comwrote:

I'm happy to provide an example! This is pretty much what I'm trying
to implement:
http://en.wikipedia.org/w/**index.php?search=xorr+the+**gott-jewlhttp://en.wikipedia.org/w/index.php?search=xorr+the+gott-jewl
See how "god" and "jewel" are highlighted? While it'd be sweet if I
could ask elasticsearch to do the actual highlighting that really isn't
required. All I really need is a way to get a the changed tokens.
Something like this:
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": {
"got": "god",
"jewl": "jewel"
}
}
]
}

or this
{
"text": "xorr the gott-jewl",
"offset": 0,
"length": 21,
"options": [
{
"text": "xorr the god-jewel",
"score": XXXX,
"changes": [
{"was": "got", "is": "god", "from": 7, "to" 10},
{"was": "jewl", "is": "jewel", "from": 11, "to"
14},
]
}
]
}

One thing, though: it wouldn't be useful for me to see any shingle
tokens because users don't really think of them as words. It is almost
like you'd need to retokenize the text and the suggestion without the
shingle filter and compare them.

I was looking around the code but all the spares and byteSpares are
making my head spin at the moment.

Nik

On Fri, Aug 2, 2013 at 4:10 PM, simonw <simon.w...@**elasticsearch.com>wrote:

Nick, there could be multiple changes in the phrase that is returned.
so it's kind of unclear what you are asking for in particular. is it a list
of offsets that have been changed? I'd be happy to improve the feature but
I wonder how we can represent what you need. maybe you have an example?

simon

On Wednesday, July 31, 2013 6:05:58 PM UTC+2, Nikolas Everett wrote:

Is it possible to convince the phrase suggester to return the
suggested phrase in such a way that I can highlight the changed words
easily? I can certainly split both phrases on spaces and compare them but
that really isn't going to work for non-space-delimited languages.

Nik

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8