Most common adjacent words

Hello,

Does elasticsearch have the ability to return the most common adjacent
words for a given search query?

That is, given some documents:

{"text": "To be or not to be, that is the question"}
{"text": "We know what we are, but know not what we may be"}
{"text": "If music be the food of love, play on."}

If I search for "be", I'd like to get back something like this:

{ "to be": 2 }
{ "may be": 1 }
{ "be or": 1 }
{ "music be": 1 }
{ "be the": 1 }

I was looking at the phrase suggester, but couldn't make it do this (it
seems adamant about correcting the input text).
If there's no way to do this currently, would it be feasible to write a
plugin to do so?

Any advice is much appreciated.

Jari

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1f4a0ffa-b78c-426c-bc9a-76b068833544%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I don't see a way to do exactly what you are looking for.
But, with a little effort on client you could give a try to the highlighting feature which could give something similar.

Or may be an aggregation with a first level agg as a filter for the term, then a Terms agg on the field but with a shingle analyzer,
Might give some results.

HTH.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 23 févr. 2015 à 00:42, jari@holderdeord.no a écrit :

Hello,

Does elasticsearch have the ability to return the most common adjacent words for a given search query?

That is, given some documents:

{"text": "To be or not to be, that is the question"}
{"text": "We know what we are, but know not what we may be"}
{"text": "If music be the food of love, play on."}

If I search for "be", I'd like to get back something like this:

{ "to be": 2 }
{ "may be": 1 }
{ "be or": 1 }
{ "music be": 1 }
{ "be the": 1 }

I was looking at the phrase suggester, but couldn't make it do this (it seems adamant about correcting the input text).
If there's no way to do this currently, would it be feasible to write a plugin to do so?

Any advice is much appreciated.

Jari

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1f4a0ffa-b78c-426c-bc9a-76b068833544%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8DC12885-C3E7-438C-BAE2-4784FFECC14B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Thanks for the suggestion.

I tried your second idea but it seems like running a terms aggregation on
my shingles text field is a bit too much for ES.
Even if it did work, it wouldn't have given me any data on adjacency /
proximity.

[FIELDDATA] Data too large, data for [text.shingles] would be larger than
limit of [623326003/594.4mb]]

{
"size": 0,
"aggregations": {
"myAggregation": {
"filter": {
"query": {
"query_string": {
"default_field": "text",
"query": "foo"
}
}
},
"aggregations": {
"combos": {
"terms": { "field": "text.shingles" }
}
}
}
}
}

On Monday, February 23, 2015 at 5:01:59 AM UTC+1, David Pilato wrote:

I don't see a way to do exactly what you are looking for.
But, with a little effort on client you could give a try to the
highlighting feature which could give something similar.

Or may be an aggregation with a first level agg as a filter for the term,
then a Terms agg on the field but with a shingle analyzer,
Might give some results.

HTH.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 23 févr. 2015 à 00:42, ja...@holderdeord.no <javascript:> a écrit :

Hello,

Does elasticsearch have the ability to return the most common adjacent
words for a given search query?

That is, given some documents:

{"text": "To be or not to be, that is the question"}
{"text": "We know what we are, but know not what we may be"}
{"text": "If music be the food of love, play on."}

If I search for "be", I'd like to get back something like this:

{ "to be": 2 }
{ "may be": 1 }
{ "be or": 1 }
{ "music be": 1 }
{ "be the": 1 }

I was looking at the phrase suggester, but couldn't make it do this (it
seems adamant about correcting the input text).
If there's no way to do this currently, would it be feasible to write a
plugin to do so?

Any advice is much appreciated.

Jari

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1f4a0ffa-b78c-426c-bc9a-76b068833544%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1f4a0ffa-b78c-426c-bc9a-76b068833544%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c8c42fde-3b85-44a9-8802-b4c6e4fc0545%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.