I understand that ES is "near" realtime (updating every second by default).
I think ES is awesome and I'm using it to great effect, but this minor
issue keeps popping up and workarounds that make sure the user reads their
own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near"
realtime isn't the end goal for the project. Are there any plans as to how
to achieve it?
How about a mini-shard alongside each normal shard that holds the documents
that are not yet inserted into the index? It could expose the same
functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
I understand that ES is "near" realtime (updating every second by default). I think ES is awesome and I'm using it to great effect, but this minor issue keeps popping up and workarounds that make sure the user reads their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near" realtime isn't the end goal for the project. Are there any plans as to how to achieve it?
How about a mini-shard alongside each normal shard that holds the documents that are not yet inserted into the index? It could expose the same functionality as a normal shard, but it could be updated with each insertion, effectively acting as a sort of cache.
I'm using ES in combination with Couchbase so I'm only using ES for
searching and then Couchbase for everything else.
On Wednesday, March 20, 2013 4:06:39 PM UTC, David Pilato wrote:
If you need to read, then you should know that read is realtime.
Search is not.
Are you looking for realtime get ?
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 20 mars 2013 à 16:59, marcuslongmuir <marcusl...@me.com <javascript:>>
a écrit :
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near"
realtime isn't the end goal for the project. Are there any plans as to how
to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the
same functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase in the middle and you have a first latency with Couchbase transport and another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that you need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by default). I think ES is awesome and I'm using it to great effect, but this minor issue keeps popping up and workarounds that make sure the user reads their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near" realtime isn't the end goal for the project. Are there any plans as to how to achieve it?
How about a mini-shard alongside each normal shard that holds the documents that are not yet inserted into the index? It could expose the same functionality as a normal shard, but it could be updated with each insertion, effectively acting as a sort of cache.
Sorry for the confusion. The most common use case is usually similar to a
comment being posted on an article or post (generic discussion board).
Assuming we're not using AJAX and the comment is posted via a form, the
page is loaded, the comment is inserted and then the latest comments are
retrieved for display via a search. Our user's latest comment is omitted
because even though it was inserted before the search was performed it had
yet to be inserted into the index. The workaround is simple - just add the
comment to bottom as long as it wasn't included in the search results, but
in more complex manifestations of this problem it isn't that easy.
Its a rather generic issue.
On Wednesday, March 20, 2013 4:30:24 PM UTC, David Pilato wrote:
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase in
the middle and you have a first latency with Couchbase transport and
another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that you
need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 20 mars 2013 à 17:10, marcuslongmuir <marcusl...@me.com <javascript:>>
a écrit :
I'm using ES in combination with Couchbase so I'm only using ES for
searching and then Couchbase for everything else.
On Wednesday, March 20, 2013 4:06:39 PM UTC, David Pilato wrote:
If you need to read, then you should know that read is realtime.
Search is not.
Are you looking for realtime get ?
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near"
realtime isn't the end goal for the project. Are there any plans as to how
to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the
same functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
Can you just add a refresh=true to your add comment call or lower the
refresh interval for the comment index? That is assuming your add comment
rate is lower then your search rate.
you could ask the devs to include this somehow to elasticsearch, or
probably there is already an open issue although I did find one.
Peter.
On Wednesday, March 20, 2013 5:39:44 PM UTC+1, marcuslongmuir wrote:
Sorry for the confusion. The most common use case is usually similar to a
comment being posted on an article or post (generic discussion board).
Assuming we're not using AJAX and the comment is posted via a form, the
page is loaded, the comment is inserted and then the latest comments are
retrieved for display via a search. Our user's latest comment is omitted
because even though it was inserted before the search was performed it had
yet to be inserted into the index. The workaround is simple - just add the
comment to bottom as long as it wasn't included in the search results, but
in more complex manifestations of this problem it isn't that easy.
Its a rather generic issue.
On Wednesday, March 20, 2013 4:30:24 PM UTC, David Pilato wrote:
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase in
the middle and you have a first latency with Couchbase transport and
another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that
you need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near"
realtime isn't the end goal for the project. Are there any plans as to how
to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the
same functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
That blog post appears to solve almost the opposite problem - that the user
will see changes when they shouldn't.
I've thought about setting refresh to true with every save, but it seems
like it would be detrimental to performance. I'm planning a large cluster
and the refresh must occur on each server. To do that with each save
doesn't sound healthy and would be increasingly problematic as the number
of servers grows.
On Thursday, March 21, 2013 9:58:49 AM UTC, Karussell wrote:
you could ask the devs to include this somehow to elasticsearch, or
probably there is already an open issue although I did find one.
Peter.
On Wednesday, March 20, 2013 5:39:44 PM UTC+1, marcuslongmuir wrote:
Sorry for the confusion. The most common use case is usually similar to a
comment being posted on an article or post (generic discussion board).
Assuming we're not using AJAX and the comment is posted via a form, the
page is loaded, the comment is inserted and then the latest comments are
retrieved for display via a search. Our user's latest comment is omitted
because even though it was inserted before the search was performed it had
yet to be inserted into the index. The workaround is simple - just add the
comment to bottom as long as it wasn't included in the search results, but
in more complex manifestations of this problem it isn't that easy.
Its a rather generic issue.
On Wednesday, March 20, 2013 4:30:24 PM UTC, David Pilato wrote:
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase in
the middle and you have a first latency with Couchbase transport and
another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that
you need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near"
realtime isn't the end goal for the project. Are there any plans as to how
to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the
same functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
On Thu, 2013-03-21 at 04:37 -0700, marcuslongmuir wrote:
That blog post appears to solve almost the opposite problem - that the
user will see changes when they shouldn't.
Agreed
I've thought about setting refresh to true with every save, but it
seems like it would be detrimental to performance. I'm planning a
large cluster and the refresh must occur on each server. To do that
with each save doesn't sound healthy and would be increasingly
problematic as the number of servers grows.
Agreed, again
I've also wished for some in-memory only index that is able to search
for docs that have just been indexed but are not yet written to
segments. However, (not knowing much about the internals) I think that
would require creating an inverted index every time a new doc is
indexed, and would probably have quite an impact on performance.
I think the way you're currently managing it is the best solution:
user adds comment
index comment into ES, and keep comment data & metadata around
do search for comments, and filter out user's new comment by ID
Andy Wick's suggestion is the best one for your usecase. Just make the user
wait a litle bit before redirecting them to the page where the comments are
shown. Lower the index refresh time if you really want to see a quick
response.
Doing the update to couchbase and elasticsearch in parallel is also an
option maybe.
On Wednesday, March 20, 2013 5:39:44 PM UTC+1, marcuslongmuir wrote:
Sorry for the confusion. The most common use case is usually similar to a
comment being posted on an article or post (generic discussion board).
Assuming we're not using AJAX and the comment is posted via a form, the
page is loaded, the comment is inserted and then the latest comments are
retrieved for display via a search. Our user's latest comment is omitted
because even though it was inserted before the search was performed it had
yet to be inserted into the index. The workaround is simple - just add the
comment to bottom as long as it wasn't included in the search results, but
in more complex manifestations of this problem it isn't that easy.
Its a rather generic issue.
On Wednesday, March 20, 2013 4:30:24 PM UTC, David Pilato wrote:
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase in
the middle and you have a first latency with Couchbase transport and
another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that
you need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near"
realtime isn't the end goal for the project. Are there any plans as to how
to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the
same functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
As much of an impact on performance as a refresh with every change?
On Thursday, March 21, 2013 11:54:56 AM UTC, Clinton Gormley wrote:
On Thu, 2013-03-21 at 04:37 -0700, marcuslongmuir wrote:
That blog post appears to solve almost the opposite problem - that the
user will see changes when they shouldn't.
Agreed
I've thought about setting refresh to true with every save, but it
seems like it would be detrimental to performance. I'm planning a
large cluster and the refresh must occur on each server. To do that
with each save doesn't sound healthy and would be increasingly
problematic as the number of servers grows.
Agreed, again
I've also wished for some in-memory only index that is able to search
for docs that have just been indexed but are not yet written to
segments. However, (not knowing much about the internals) I think that
would require creating an inverted index every time a new doc is
indexed, and would probably have quite an impact on performance.
I think the way you're currently managing it is the best solution:
user adds comment
index comment into ES, and keep comment data & metadata around
do search for comments, and filter out user's new comment by ID
It's a good solution for that use case, but its not a solution for all of
the problems that near realtime causes. Say I have a flag that I can set to
prevent new insertions.
Example:
*Set the flag to prevent insertions.
*Perform a search of current documents.
*Delete retrieved documents.
I might assume that I had deleted all documents, but if the search results
are reflecting the state up to 1 second ago I might miss documents inserted
just before the flag was set.
The workarounds suggested include waiting for the refresh (delay for a
second) or refreshing on every change - both of which are directly or
indirectly detrimental to the user experience in this case.
Elasticsearch is awesome, but it seems that a remnant of the days before
dynamic indexing is holding it back. If you were building a search product
from the ground up today I doubt near realtime would be the end goal.
Can any of the contributors shed some light on whether realtime search is
desired or is in the pipeline?
On Thursday, March 21, 2013 1:04:25 PM UTC, Jaap Taal wrote:
Andy Wick's suggestion is the best one for your usecase. Just make the
user wait a litle bit before redirecting them to the page where the
comments are shown. Lower the index refresh time if you really want to see
a quick response.
Doing the update to couchbase and elasticsearch in parallel is also an
option maybe.
On Wednesday, March 20, 2013 5:39:44 PM UTC+1, marcuslongmuir wrote:
Sorry for the confusion. The most common use case is usually similar to a
comment being posted on an article or post (generic discussion board).
Assuming we're not using AJAX and the comment is posted via a form, the
page is loaded, the comment is inserted and then the latest comments are
retrieved for display via a search. Our user's latest comment is omitted
because even though it was inserted before the search was performed it had
yet to be inserted into the index. The workaround is simple - just add the
comment to bottom as long as it wasn't included in the search results, but
in more complex manifestations of this problem it isn't that easy.
Its a rather generic issue.
On Wednesday, March 20, 2013 4:30:24 PM UTC, David Pilato wrote:
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase in
the middle and you have a first latency with Couchbase transport and
another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that
you need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having "near"
realtime isn't the end goal for the project. Are there any plans as to how
to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the
same functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
sorry, that was indeed exactly the opposite topic although I saved this
article in my brain to solve this 'comment' problem...but I thought mike
blogged about exactly this problem ...
However .. one option which you could explore is a small 'feeding' index
with a very small refresh_interval. This index will be copied from time
to time to your normal 'static' index. Via aliases you can easily search
over several indices. Avoiding duplicates should be easy if you filter both
indices by time (explore the filter alias thing)
Peter.
On Thursday, March 21, 2013 3:03:15 PM UTC+1, marcuslongmuir wrote:
It's a good solution for that use case, but its not a solution for all of
the problems that near realtime causes. Say I have a flag that I can set to
prevent new insertions.
Example:
*Set the flag to prevent insertions.
*Perform a search of current documents.
*Delete retrieved documents.
I might assume that I had deleted all documents, but if the search results
are reflecting the state up to 1 second ago I might miss documents inserted
just before the flag was set.
The workarounds suggested include waiting for the refresh (delay for a
second) or refreshing on every change - both of which are directly or
indirectly detrimental to the user experience in this case.
Elasticsearch is awesome, but it seems that a remnant of the days before
dynamic indexing is holding it back. If you were building a search product
from the ground up today I doubt near realtime would be the end goal.
Can any of the contributors shed some light on whether realtime search is
desired or is in the pipeline?
On Thursday, March 21, 2013 1:04:25 PM UTC, Jaap Taal wrote:
Andy Wick's suggestion is the best one for your usecase. Just make the
user wait a litle bit before redirecting them to the page where the
comments are shown. Lower the index refresh time if you really want to see
a quick response.
Doing the update to couchbase and elasticsearch in parallel is also an
option maybe.
On Wednesday, March 20, 2013 5:39:44 PM UTC+1, marcuslongmuir wrote:
Sorry for the confusion. The most common use case is usually similar to
a comment being posted on an article or post (generic discussion board).
Assuming we're not using AJAX and the comment is posted via a form, the
page is loaded, the comment is inserted and then the latest comments are
retrieved for display via a search. Our user's latest comment is omitted
because even though it was inserted before the search was performed it had
yet to be inserted into the index. The workaround is simple - just add the
comment to bottom as long as it wasn't included in the search results, but
in more complex manifestations of this problem it isn't that easy.
Its a rather generic issue.
On Wednesday, March 20, 2013 4:30:24 PM UTC, David Pilato wrote:
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase
in the middle and you have a first latency with Couchbase transport and
another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that
you need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having
"near" realtime isn't the end goal for the project. Are there any plans as
to how to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the
same functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
For user initiated actions, where you only want 1 user to see the
change realtime, the refresh=true option is almost always a good
solution. Your user just added a comment and hit submit the comment,
the user probably doesn't care whether it takes a second.
If you're indexing a large amount of data in high speed then the
referesh option would slow your indexation speed down by a huge
factor. I think this is meant by performance. For Elasticsearch it
doesn't really matter, although having lot's of clients waiting for
refresh adds a little overhead.
Jaap Taal
[ Q42 BV | tel 070 44523 42 | direct 070 44523 65 | http://q42.nl |
Waldorpstraat 17F, Den Haag | Vijzelstraat 72 unit 4.23, Amsterdam |
KvK 30164662 ]
sorry, that was indeed exactly the opposite topic although I saved this
article in my brain to solve this 'comment' problem...but I thought mike
blogged about exactly this problem ...
However .. one option which you could explore is a small 'feeding' index
with a very small refresh_interval. This index will be copied from time to
time to your normal 'static' index. Via aliases you can easily search over
several indices. Avoiding duplicates should be easy if you filter both
indices by time (explore the filter alias thing)
Peter.
On Thursday, March 21, 2013 3:03:15 PM UTC+1, marcuslongmuir wrote:
It's a good solution for that use case, but its not a solution for all of
the problems that near realtime causes. Say I have a flag that I can set to
prevent new insertions.
Example:
*Set the flag to prevent insertions.
*Perform a search of current documents.
*Delete retrieved documents.
I might assume that I had deleted all documents, but if the search results
are reflecting the state up to 1 second ago I might miss documents inserted
just before the flag was set.
The workarounds suggested include waiting for the refresh (delay for a
second) or refreshing on every change - both of which are directly or
indirectly detrimental to the user experience in this case.
Elasticsearch is awesome, but it seems that a remnant of the days before
dynamic indexing is holding it back. If you were building a search product
from the ground up today I doubt near realtime would be the end goal.
Can any of the contributors shed some light on whether realtime search is
desired or is in the pipeline?
On Thursday, March 21, 2013 1:04:25 PM UTC, Jaap Taal wrote:
Andy Wick's suggestion is the best one for your usecase. Just make the
user wait a litle bit before redirecting them to the page where the comments
are shown. Lower the index refresh time if you really want to see a quick
response.
Doing the update to couchbase and elasticsearch in parallel is also an
option maybe.
On Wednesday, March 20, 2013 5:39:44 PM UTC+1, marcuslongmuir wrote:
Sorry for the confusion. The most common use case is usually similar to
a comment being posted on an article or post (generic discussion board).
Assuming we're not using AJAX and the comment is posted via a form, the
page is loaded, the comment is inserted and then the latest comments are
retrieved for display via a search. Our user's latest comment is omitted
because even though it was inserted before the search was performed it had
yet to be inserted into the index. The workaround is simple - just add the
comment to bottom as long as it wasn't included in the search results, but
in more complex manifestations of this problem it isn't that easy.
Its a rather generic issue.
On Wednesday, March 20, 2013 4:30:24 PM UTC, David Pilato wrote:
But you talked about "the user reads their own writes".
Just wondering if realtime search is really what you need.
That said, I understand your concern here, because you have Couchbase
in the middle and you have a first latency with Couchbase transport and
another one with the ES refresh.
Can you describe a bit more your use case and the reason you feel that
you need realtime search?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
I understand that ES is "near" realtime (updating every second by
default). I think ES is awesome and I'm using it to great effect, but this
minor issue keeps popping up and workarounds that make sure the user reads
their own writes immediately aren't nice.
Truly realtime search is obviously desirable and I assume having
"near" realtime isn't the end goal for the project. Are there any plans as
to how to achieve it?
How about a mini-shard alongside each normal shard that holds the
documents that are not yet inserted into the index? It could expose the same
functionality as a normal shard, but it could be updated with each
insertion, effectively acting as a sort of cache.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.