Search hits number jumps up and down when updates documents


(Weiwei Wang) #1

when i use bulk to update lots of documents (e.g. 20000), i found the
search hits jump up and down, which is very confusing for users

ps: i used ES for real time search


(Shay Banon) #2

As you update documents, then the changes will be reflected to the user, that the near real time aspect of it. By default, the refresh interval will be 1 sec. You can disable refreshing using the update settings API (set the refresh interval to -1) and then enable it once you are done with bulk loading the data it it make sense to your use case.

On Saturday, June 18, 2011 at 11:57 AM, Weiwei Wang wrote:

when i use bulk to update lots of documents (e.g. 20000), i found the
search hits jump up and down, which is very confusing for users

ps: i used ES for real time search


(Berkay Mollamustafaoglu-2) #3

I think this is due to synchronization between replicas rather than refresh
interval? The queries alternate between different replicas and since data
may not yet replicated, users see results that fluctuate depending on which
replica the query is executed against.

It is potentially a usability problem for users, and it can cause problems
if the data is processed as it is written to ES, right?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sat, Jun 18, 2011 at 2:14 PM, Shay Banon shay.banon@elasticsearch.comwrote:

As you update documents, then the changes will be reflected to the user,
that the near real time aspect of it. By default, the refresh interval will
be 1 sec. You can disable refreshing using the update settings API (set the
refresh interval to -1) and then enable it once you are done with bulk
loading the data it it make sense to your use case.

On Saturday, June 18, 2011 at 11:57 AM, Weiwei Wang wrote:

when i use bulk to update lots of documents (e.g. 20000), i found the
search hits jump up and down, which is very confusing for users

ps: i used ES for real time search


(Shay Banon) #4

That can happen as well (though I don't think its the case here), though easily solvable with the preference feature in search: http://www.elasticsearch.org/guide/reference/api/search/preference.html.

On Monday, June 20, 2011 at 5:44 PM, Berkay Mollamustafaoglu wrote:

I think this is due to synchronization between replicas rather than refresh interval? The queries alternate between different replicas and since data may not yet replicated, users see results that fluctuate depending on which replica the query is executed against.

It is potentially a usability problem for users, and it can cause problems if the data is processed as it is written to ES, right?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sat, Jun 18, 2011 at 2:14 PM, Shay Banon <shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> wrote:

As you update documents, then the changes will be reflected to the user, that the near real time aspect of it. By default, the refresh interval will be 1 sec. You can disable refreshing using the update settings API (set the refresh interval to -1) and then enable it once you are done with bulk loading the data it it make sense to your use case.

On Saturday, June 18, 2011 at 11:57 AM, Weiwei Wang wrote:

when i use bulk to update lots of documents (e.g. 20000), i found the
search hits jump up and down, which is very confusing for users

ps: i used ES for real time search


(Weiwei Wang) #5

if i set it to _primary, how can i do load balance? the single primary
point will need to take all the coming request and as a result it have
to take a high pressure as well as quick response

On Jun 21, 4:29 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

That can happen as well (though I don't think its the case here), though easily solvable with the preference feature in search:http://www.elasticsearch.org/guide/reference/api/search/preference.html.

On Monday, June 20, 2011 at 5:44 PM, Berkay Mollamustafaoglu wrote:

I think this is due to synchronization between replicas rather than refresh interval? The queries alternate between different replicas and since data may not yet replicated, users see results that fluctuate depending on which replica the query is executed against.

It is potentially a usability problem for users, and it can cause problems if the data is processed as it is written to ES, right?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sat, Jun 18, 2011 at 2:14 PM, Shay Banon <shay.ba...@elasticsearch.com (mailto:shay.ba...@elasticsearch.com)> wrote:

As you update documents, then the changes will be reflected to the user, that the near real time aspect of it. By default, the refresh interval will be 1 sec. You can disable refreshing using the update settings API (set the refresh interval to -1) and then enable it once you are done with bulk loading the data it it make sense to your use case.

On Saturday, June 18, 2011 at 11:57 AM, Weiwei Wang wrote:

when i use bulk to update lots of documents (e.g. 20000), i found the
search hits jump up and down, which is very confusing for users

ps: i used ES for real time search


(Shay Banon) #6

Check the docs, you can give it a unique value per user, for example, username, or sessionid.

On Tuesday, June 21, 2011 at 3:37 PM, Weiwei Wang wrote:

if i set it to _primary, how can i do load balance? the single primary
point will need to take all the coming request and as a result it have
to take a high pressure as well as quick response

On Jun 21, 4:29 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

That can happen as well (though I don't think its the case here), though easily solvable with the preference feature in search:http://www.elasticsearch.org/guide/reference/api/search/preference.html.

On Monday, June 20, 2011 at 5:44 PM, Berkay Mollamustafaoglu wrote:

I think this is due to synchronization between replicas rather than refresh interval? The queries alternate between different replicas and since data may not yet replicated, users see results that fluctuate depending on which replica the query is executed against.

It is potentially a usability problem for users, and it can cause problems if the data is processed as it is written to ES, right?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sat, Jun 18, 2011 at 2:14 PM, Shay Banon <shay.ba...@elasticsearch.com (mailto:shay.ba...@elasticsearch.com (http://elasticsearch.com))> wrote:

As you update documents, then the changes will be reflected to the user, that the near real time aspect of it. By default, the refresh interval will be 1 sec. You can disable refreshing using the update settings API (set the refresh interval to -1) and then enable it once you are done with bulk loading the data it it make sense to your use case.

On Saturday, June 18, 2011 at 11:57 AM, Weiwei Wang wrote:

when i use bulk to update lots of documents (e.g. 20000), i found the
search hits jump up and down, which is very confusing for users

ps: i used ES for real time search


(Weiwei Wang) #7

gotcha, thanks shay~

On Jun 21, 9:11 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Check the docs, you can give it a unique value per user, for example, username, or sessionid.

On Tuesday, June 21, 2011 at 3:37 PM, Weiwei Wang wrote:

if i set it to _primary, how can i do load balance? the single primary
point will need to take all the coming request and as a result it have
to take a high pressure as well as quick response

On Jun 21, 4:29 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

That can happen as well (though I don't think its the case here), though easily solvable with the preference feature in search:http://www.elasticsearch.org/guide/reference/api/search/preference.html.

On Monday, June 20, 2011 at 5:44 PM, Berkay Mollamustafaoglu wrote:

I think this is due to synchronization between replicas rather than refresh interval? The queries alternate between different replicas and since data may not yet replicated, users see results that fluctuate depending on which replica the query is executed against.

It is potentially a usability problem for users, and it can cause problems if the data is processed as it is written to ES, right?

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Sat, Jun 18, 2011 at 2:14 PM, Shay Banon <shay.ba...@elasticsearch.com (mailto:shay.ba...@elasticsearch.com (http://elasticsearch.com))> wrote:

As you update documents, then the changes will be reflected to the user, that the near real time aspect of it. By default, the refresh interval will be 1 sec. You can disable refreshing using the update settings API (set the refresh interval to -1) and then enable it once you are done with bulk loading the data it it make sense to your use case.

On Saturday, June 18, 2011 at 11:57 AM, Weiwei Wang wrote:

when i use bulk to update lots of documents (e.g. 20000), i found the
search hits jump up and down, which is very confusing for users

ps: i used ES for real time search


(system) #8