Automatically build `input` for `completion` fields?


(Sviatoslav Abakumov) #1

Hello,

I have an index facebook with type post. I need to provide users with
autocompletions using terms that appear in post.message. Also the list of
completions should be sorted by score.

The mapping is as follows:
{
"post": {
"properties": {
"created_time": {
"type": "date",
"format": "dateOptionalTime"
},
"link": {
"type": "string"
},
"message": {
"type": "string"
},
"object_id": {
"type": "string"
},
"picture": {
"type": "string"
},
"shares_count": {
"type": "long"
},
"type": {
"type": "string"
},
"update_time": {
"type": "date",
"format": "dateOptionalTime"
},
"user": {
"type": "long"
}
}
}
}

An example document:
{
"picture": "...",
"update_time": "2014-03-19T23:16:59",
"message": "The
day has finally arrived - the first piece of the 1,000,000 Swag Bucks
pie has been served!! Check to see if you're our first winner!",
"object_id": "",
"shares_count": 0,
"link": "...",
"user": ...,
"created_time": "2014-03-17T21:02:32",
"type": "link"
}

To achieve the goal I've added one more field to the mapping:
"message_suggest": {
"type": "completion"
}

Every time I write a document, I query ES to tokenize the string message:
POST _analyze?_tokenizer=standard
The day has finally arrived - the first piece of the 1,000,000 Swag Bucks
pie has been served!! Check to see if you're our first winner!

Then I get the list of tokens from the response and add it to
post.message_suggest.input.

When I do the following request, I get what I wanted:
POST facebook/_suggest
{
"messages": {
"text": "pi",
"completion": {
"field": "message_suggest"
}
}
}

I sense that this approach is not right or at least not optimal. I am new
to Elasticsearch and I would appreciate any input.

Best,
Sviatoslav.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ea7946c-59e3-4b6b-89f8-0ff327d19017%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Reelsen) #2

Hey,

there is no automation for this. The main reason why your solution might
work in your specific use-case is, that you do not have billions of
documents. Otherwise there would be a lot of documents, which contain for
example piece (pi suggestions) or winner (wi), and you would get a lot of
results. However the goal of a good suggestions is not to get a lot of
results but few very good ones, which are likely to be chosen by the user
for further queries. Just adding arbitrary suggestions without
scoring/weighting them does not make help the user a lot from my experience.

Hope it makes sense..

--Alex

On Wed, Apr 2, 2014 at 10:26 AM, Sviatoslav Abakumov <
abakumov.sviatoslav@progforce.com> wrote:

Hello,

I have an index facebook with type post. I need to provide users with
autocompletions using terms that appear in post.message. Also the list of
completions should be sorted by score.

The mapping is as follows:
{
"post": {
"properties": {
"created_time": {
"type": "date",
"format": "dateOptionalTime"
},
"link": {
"type": "string"
},
"message": {
"type": "string"
},
"object_id": {
"type": "string"
},
"picture": {
"type": "string"
},
"shares_count": {
"type": "long"
},
"type": {
"type": "string"
},
"update_time": {
"type": "date",
"format": "dateOptionalTime"
},
"user": {
"type": "long"
}
}
}
}

An example document:
{
"picture": "...",
"update_time": "2014-03-19T23:16:59",
"message": "The
day has finally arrived - the first piece of the 1,000,000 Swag Bucks
pie has been served!! Check to see if you're our first winner!",
"object_id": "",
"shares_count": 0,
"link": "...",
"user": ...,
"created_time": "2014-03-17T21:02:32",
"type": "link"
}

To achieve the goal I've added one more field to the mapping:
"message_suggest": {
"type": "completion"
}

Every time I write a document, I query ES to tokenize the string message:
POST _analyze?_tokenizer=standard
The day has finally arrived - the first piece of the 1,000,000 Swag Bucks
pie has been served!! Check to see if you're our first winner!

Then I get the list of tokens from the response and add it to
post.message_suggest.input.

When I do the following request, I get what I wanted:
POST facebook/_suggest
{
"messages": {
"text": "pi",
"completion": {
"field": "message_suggest"
}
}
}

I sense that this approach is not right or at least not optimal. I am new
to Elasticsearch and I would appreciate any input.

Best,
Sviatoslav.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ea7946c-59e3-4b6b-89f8-0ff327d19017%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/7ea7946c-59e3-4b6b-89f8-0ff327d19017%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9muU19N2jprvPMuGd8ZVzaDTLTC4koFmwqPHuNziyT7A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Sviatoslav Abakumov) #3

What about fields index_analyzer and search_analyzer? I had hope
that index_analyzer would split string message_suggest.input into
array, but it doesn't seem to do it. What are they for then?

On Mon, Apr 7, 2014 at 11:20 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

there is no automation for this. The main reason why your solution might
work in your specific use-case is, that you do not have billions of
documents. Otherwise there would be a lot of documents, which contain for
example piece (pi suggestions) or winner (wi), and you would get a lot of
results. However the goal of a good suggestions is not to get a lot of
results but few very good ones, which are likely to be chosen by the user
for further queries. Just adding arbitrary suggestions without
scoring/weighting them does not make help the user a lot from my experience.

Hope it makes sense..

--Alex

On Wed, Apr 2, 2014 at 10:26 AM, Sviatoslav Abakumov
abakumov.sviatoslav@progforce.com wrote:

Hello,

I have an index facebook with type post. I need to provide users with
autocompletions using terms that appear in post.message. Also the list of
completions should be sorted by score.

The mapping is as follows:
{
"post": {
"properties": {
"created_time": {
"type": "date",
"format": "dateOptionalTime"
},
"link": {
"type": "string"
},
"message": {
"type": "string"
},
"object_id": {
"type": "string"
},
"picture": {
"type": "string"
},
"shares_count": {
"type": "long"
},
"type": {
"type": "string"
},
"update_time": {
"type": "date",
"format": "dateOptionalTime"
},
"user": {
"type": "long"
}
}
}
}

An example document:
{
"picture": "...",
"update_time": "2014-03-19T23:16:59",
"message": "The
day has finally arrived - the first piece of the 1,000,000 Swag Bucks
pie has been served!! Check to see if you're our first winner!",
"object_id": "",
"shares_count": 0,
"link": "...",
"user": ...,
"created_time": "2014-03-17T21:02:32",
"type": "link"
}

To achieve the goal I've added one more field to the mapping:
"message_suggest": {
"type": "completion"
}

Every time I write a document, I query ES to tokenize the string
message:
POST _analyze?_tokenizer=standard
The day has finally arrived - the first piece of the 1,000,000 Swag Bucks
pie has been served!! Check to see if you're our first winner!

Then I get the list of tokens from the response and add it to
post.message_suggest.input.

When I do the following request, I get what I wanted:
POST facebook/_suggest
{
"messages": {
"text": "pi",
"completion": {
"field": "message_suggest"
}
}
}

I sense that this approach is not right or at least not optimal. I am new
to Elasticsearch and I would appreciate any input.

Best,
Sviatoslav.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ea7946c-59e3-4b6b-89f8-0ff327d19017%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/MV2369vLp0g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9muU19N2jprvPMuGd8ZVzaDTLTC4koFmwqPHuNziyT7A%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKmtHP_m4AAeT0r-7ujeJ5RjSSY%3DEdSw-VUufSVD7zgPS2hHKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Alexander Reelsen) #4

Hey,

it is exactly the same functionality index/search analyzers serve when
indexing/querying any other fields.. It defines the analysis process chain
(consisting of tokenizer, token filters and optionally char filters), when
a field is either indexed or queried, to make sure the terms are in the
same format. No special handling for the completion suggester here.

--Alex

On Wed, Apr 23, 2014 at 2:52 AM, Sviatoslav Abakumov <
abakumov.sviatoslav@progforce.com> wrote:

What about fields index_analyzer and search_analyzer? I had hope
that index_analyzer would split string message_suggest.input into
array, but it doesn't seem to do it. What are they for then?

On Mon, Apr 7, 2014 at 11:20 AM, Alexander Reelsen alr@spinscale.de
wrote:

Hey,

there is no automation for this. The main reason why your solution might
work in your specific use-case is, that you do not have billions of
documents. Otherwise there would be a lot of documents, which contain for
example piece (pi suggestions) or winner (wi), and you would get a lot of
results. However the goal of a good suggestions is not to get a lot of
results but few very good ones, which are likely to be chosen by the user
for further queries. Just adding arbitrary suggestions without
scoring/weighting them does not make help the user a lot from my
experience.

Hope it makes sense..

--Alex

On Wed, Apr 2, 2014 at 10:26 AM, Sviatoslav Abakumov
abakumov.sviatoslav@progforce.com wrote:

Hello,

I have an index facebook with type post. I need to provide users
with

autocompletions using terms that appear in post.message. Also the
list of

completions should be sorted by score.

The mapping is as follows:
{
"post": {
"properties": {
"created_time": {
"type": "date",
"format": "dateOptionalTime"
},
"link": {
"type": "string"
},
"message": {
"type": "string"
},
"object_id": {
"type": "string"
},
"picture": {
"type": "string"
},
"shares_count": {
"type": "long"
},
"type": {
"type": "string"
},
"update_time": {
"type": "date",
"format": "dateOptionalTime"
},
"user": {
"type": "long"
}
}
}
}

An example document:
{
"picture": "...",
"update_time": "2014-03-19T23:16:59",
"message": "The
day has finally arrived - the first piece of the 1,000,000 Swag Bucks
pie has been served!! Check to see if you're our first winner!",
"object_id": "",
"shares_count": 0,
"link": "...",
"user": ...,
"created_time": "2014-03-17T21:02:32",
"type": "link"
}

To achieve the goal I've added one more field to the mapping:
"message_suggest": {
"type": "completion"
}

Every time I write a document, I query ES to tokenize the string
message:
POST _analyze?_tokenizer=standard
The day has finally arrived - the first piece of the 1,000,000 Swag
Bucks

pie has been served!! Check to see if you're our first winner!

Then I get the list of tokens from the response and add it to
post.message_suggest.input.

When I do the following request, I get what I wanted:
POST facebook/_suggest
{
"messages": {
"text": "pi",
"completion": {
"field": "message_suggest"
}
}
}

I sense that this approach is not right or at least not optimal. I am
new

to Elasticsearch and I would appreciate any input.

Best,
Sviatoslav.

--
You received this message because you are subscribed to the Google
Groups

"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an

email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/7ea7946c-59e3-4b6b-89f8-0ff327d19017%40googlegroups.com
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/MV2369vLp0g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9muU19N2jprvPMuGd8ZVzaDTLTC4koFmwqPHuNziyT7A%40mail.gmail.com
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKmtHP_m4AAeT0r-7ujeJ5RjSSY%3DEdSw-VUufSVD7zgPS2hHKA%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_zPSfY5zTaa9iFLwuhv45a4TedwA1bqt8aga3XsRNNRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5