Search for article with 'most recent' comment

Hi - I'm pretty new to ES and have a specific use case I'd appreciate some
help with.

I have indexed about 100m 'articles' and each article has between 1 and 50
'comments' which I'm yet to index.

I'd like to be able to execute a query to return the 50 latest articles
(easy range query) along with just the most recent comment for each article

Each article and comment have a creationDateTime I'm just not sure of the
best way to index and query the data

I'd like to stay away from parent/child relationships if possible due to
the overhead and performance issues.

So is the best method to update a 'latestComment' field in the article each
time a comment is added? or could I store all comments in an array in the
article and somehow retrieve just the latest using a scripted field?

I would also like to be able to do the same query for 50 latest articles
where the most recent comment was made by a specific author. (the comment
has an author field). So essentially a 'Show me all articles where the
most recent comment was made by me" query.

I would really appreciate any help/pointers.

Regards
Matthew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Something important here. You have to know that if you index a document like:

{
"comment" : [
{ "text":"mytext1", "date": "2013-11-21" },
{ "text":"mytext2", "date": "2013-12-01" }
]
}

Elasticsearch will index it as if it was:

{
"comment" : {
"text":["mytext1", "mytext2"],
"date": ["2013-11-21", "2013-12-01"]
}
}

What does it mean? That the link between inner properties is lost.

If you query for doc having comments with text=mytext1 AND date=2013-12-01, the document sample I shown you will match and it's perhaps something you don't want.

To solve this, you have to index each comment as a Lucene document.
You could do this using parent/child or nested documents.

About your question, I'm not sure I understand it perfectly but I would say that you can perfectly do this on a client level.
May be you could illustrate a bit more your use case with a curl recreation on gist.github.com?

Hope this helps

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 22 novembre 2013 at 01:50:24, Matt (matthew.clark@ndmail.com) a écrit:

Hi - I'm pretty new to ES and have a specific use case I'd appreciate some help with.

I have indexed about 100m 'articles' and each article has between 1 and 50 'comments' which I'm yet to index.

I'd like to be able to execute a query to return the 50 latest articles (easy range query) along with just the most recent comment for each article

Each article and comment have a creationDateTime I'm just not sure of the best way to index and query the data

I'd like to stay away from parent/child relationships if possible due to the overhead and performance issues.

So is the best method to update a 'latestComment' field in the article each time a comment is added? or could I store all comments in an array in the article and somehow retrieve just the latest using a scripted field?

I would also like to be able to do the same query for 50 latest articles where the most recent comment was made by a specific author. (the comment has an author field). So essentially a 'Show me all articles where the most recent comment was made by me" query.

I would really appreciate any help/pointers.

Regards
Matthew

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David, you've probably saved me a lot of headaches with that comment!!
Thank you!

The requirement is very much what you suspected so it could be that
parent/child or nested is the only way to do it..

Note that the comments haven't been loaded yet so I've continued with the
document structure that you have pointed out won't actually work in the way
we require..

The following gist shows the desired result

Kind Regards
Matthew

On Friday, 22 November 2013 08:04:04 UTC, David Pilato wrote:

Something important here. You have to know that if you index a document
like:

{
"comment" : [
{ "text":"mytext1", "date": "2013-11-21" },
{ "text":"mytext2", "date": "2013-12-01" }
]
}

Elasticsearch will index it as if it was:

{
"comment" : {
"text":["mytext1", "mytext2"],
"date": ["2013-11-21", "2013-12-01"]
}
}

What does it mean? That the link between inner properties is lost.

If you query for doc having comments with text=mytext1 AND
date=2013-12-01, the document sample I shown you will match and it's
perhaps something you don't want.

To solve this, you have to index each comment as a Lucene document.
You could do this using parent/child or nested documents.

About your question, I'm not sure I understand it perfectly but I would
say that you can perfectly do this on a client level.
May be you could illustrate a bit more your use case with a curl
recreation on gist.github.comhttp://www.google.com/url?q=http%3A%2F%2Fgist.github.com&sa=D&sntz=1&usg=AFQjCNGRWsXDI8gmejU5HXZxw_mFAigdzQ
?

Hope this helps

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonethttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fdadoonet&sa=D&sntz=1&usg=AFQjCNE-DMC3YEu3X_lhRIhUzuSZGsaSqA
| @elasticsearchfrhttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Felasticsearchfr&sa=D&sntz=1&usg=AFQjCNGfXdQ98RWFMJXdiqpKnZb5GMg0zA

Le 22 novembre 2013 at 01:50:24, Matt (matthe...@ndmail.com <javascript:>)
a écrit:

Hi - I'm pretty new to ES and have a specific use case I'd appreciate some
help with.

I have indexed about 100m 'articles' and each article has between 1 and 50
'comments' which I'm yet to index.

I'd like to be able to execute a query to return the 50 latest articles
(easy range query) along with just the most recent comment for each article

Each article and comment have a creationDateTime I'm just not sure of the
best way to index and query the data

I'd like to stay away from parent/child relationships if possible due to
the overhead and performance issues.

So is the best method to update a 'latestComment' field in the article
each time a comment is added? or could I store all comments in an array in
the article and somehow retrieve just the latest using a scripted field?

I would also like to be able to do the same query for 50 latest articles
where the most recent comment was made by a specific author. (the comment
has an author field). So essentially a 'Show me all articles where the
most recent comment was made by me" query.

I would really appreciate any help/pointers.

Regards
Matthew

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Matt,

if the comments are added frequently, I think I would go for the
parent/child documents as new comments don't affect the related docs.

To add a new comment to a nested doc ES has to reindex the whole article
including all previous comments.

I'm using parent/child docs as well to store frequently changing user
tags and that works fine. I'm pretty sure the performance issues I face
are related to other less than perfect design decisions. :slight_smile:

Regarding your second question, to get articles along with just the most
recent comment for each article. I believe the only way to get
everything in just one request is to load the full source of each
article in the nested documents approach. But this way you always get
all comments to each article.

If you choose the parent/child design, you need a second roundtrip to
gather the most recent comment for each article in the results. But you
can use a multi search to get all the comments in just one request.

Best regards
Hannes

On 22.11.2013 17:21, Matt wrote:

Hi David, you've probably saved me a lot of headaches with that comment!!
Thank you!

The requirement is very much what you suspected so it could be that
parent/child or nested is the only way to do it..

Note that the comments haven't been loaded yet so I've continued with the
document structure that you have pointed out won't actually work in the way
we require..

The following gist shows the desired result

gist:a58e2db23e2ee39dc89c · GitHub

Kind Regards
Matthew

On Friday, 22 November 2013 08:04:04 UTC, David Pilato wrote:

Something important here. You have to know that if you index a document
like:

{
"comment" : [
{ "text":"mytext1", "date": "2013-11-21" },
{ "text":"mytext2", "date": "2013-12-01" }
]
}

Elasticsearch will index it as if it was:

{
"comment" : {
"text":["mytext1", "mytext2"],
"date": ["2013-11-21", "2013-12-01"]
}
}

What does it mean? That the link between inner properties is lost.

If you query for doc having comments with text=mytext1 AND
date=2013-12-01, the document sample I shown you will match and it's
perhaps something you don't want.

To solve this, you have to index each comment as a Lucene document.
You could do this using parent/child or nested documents.

About your question, I'm not sure I understand it perfectly but I would
say that you can perfectly do this on a client level.
May be you could illustrate a bit more your use case with a curl
recreation on gist.github.comhttp://www.google.com/url?q=http%3A%2F%2Fgist.github.com&sa=D&sntz=1&usg=AFQjCNGRWsXDI8gmejU5HXZxw_mFAigdzQ
?

Hope this helps

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonethttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fdadoonet&sa=D&sntz=1&usg=AFQjCNE-DMC3YEu3X_lhRIhUzuSZGsaSqA
| @elasticsearchfrhttps://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Felasticsearchfr&sa=D&sntz=1&usg=AFQjCNGfXdQ98RWFMJXdiqpKnZb5GMg0zA

Le 22 novembre 2013 at 01:50:24, Matt (matthe...@ndmail.com <javascript:>)
a écrit:

Hi - I'm pretty new to ES and have a specific use case I'd appreciate some
help with.

I have indexed about 100m 'articles' and each article has between 1 and 50
'comments' which I'm yet to index.

I'd like to be able to execute a query to return the 50 latest articles
(easy range query) along with just the most recent comment for each article

Each article and comment have a creationDateTime I'm just not sure of the
best way to index and query the data

I'd like to stay away from parent/child relationships if possible due to
the overhead and performance issues.

So is the best method to update a 'latestComment' field in the article
each time a comment is added? or could I store all comments in an array in
the article and somehow retrieve just the latest using a scripted field?

I would also like to be able to do the same query for 50 latest articles
where the most recent comment was made by a specific author. (the comment
has an author field). So essentially a 'Show me all articles where the
most recent comment was made by me" query.

I would really appreciate any help/pointers.

Regards
Matthew

You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.