Update Timing

We're setting up an ES cluster and have a question about the sequence of
events. The situation I'm attempting to prevent is a little complicated and
I just want to make sure updates will be received in the proper order.

My proposed setup is 2 ES servers behind a EC2 load balancer.

I have clients that reading which is pretty straight forward.

However, I have a service that is responsible for making updates to the
index. That service is a multithreaded process that's receiving jobs and
sending them through the Tire Ruby gem.

Let's say I have the following:

Update 2 ===> Update 1 ===> Index service ===> Load balancer ====> ES 1

                 \=====> ES 2

If I have an index named "orders" and the update pertains to a record
that's primary on ES 2. If Update 1 hits the load balancer and it directs
the update to ES1, ES1 then has to route the update to ES2. However, update
2 is not far behind and hits the load balancer and goes straight to ES2.

I'm worried that update 2 will make it to ES 2 sooner than update 1 because
update 1 has to go through that extra hop from ES1 to ultimately get to ES2
where the record lives.

Am I worrying about something that is guarded against in ES?

Thanks,
Brandon

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Check if versioning with optimistic concurrency control can help you.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I don't think it helps me necessarily. It's good to know about. But I'm
hoping that what I describe is protected against within the ES cluster and
that's what I'm more curious about.

Thanks for the link though.

Brandon

On Thu, Aug 8, 2013 at 3:37 PM, joergprante@gmail.com <joergprante@gmail.com

wrote:

Check if versioning with optimistic concurrency control can help you.

Elasticsearch Platform — Find real-time answers at scale | Elastic

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/XBpPBo-aJ60/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Why not replace the load balancer with an ES node that basically becomes
the "coordinator" of your cluster? To do this you would just set node.data
to false and node.master to true in elasticsearch.yml. This node will only
be responsible for sending your "updates" to the correct node, ES1 or ES2.

Thanks,
Matt Weber

On Thu, Aug 8, 2013 at 12:42 PM, Brandon Hilkert <
brandon@pipelinedealsco.com> wrote:

I don't think it helps me necessarily. It's good to know about. But I'm
hoping that what I describe is protected against within the ES cluster and
that's what I'm more curious about.

Thanks for the link though.

Brandon

On Thu, Aug 8, 2013 at 3:37 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Check if versioning with optimistic concurrency control can help you.

Elasticsearch Platform — Find real-time answers at scale | Elastic

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/XBpPBo-aJ60/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What kind of protection do you have in mind?

It is up to the client to create the versioning information for ordering
document actions and to take action if it fails.

Your updates are traveling independently through the cluster to the
location where they persist. Depending on the network speed and CPU load of
the nodes, the updates may arrive sooner or later at the shards. It's quite
common that some nodes are heavily busy while others are not.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I didn't have anything in particular in mind, but was hoping there was some
kind of internal communication that perhaps timestamped something when it
got to the routing node so that once it does make it to the correct data
node, a comparison could be done to decide whether it's older than data
that came straight to the data node.

That's just me thinking out loud. Probably a naive approach given the
complexity of the cluster. Sounds like there's not anything like this and
it's up to me to preserve ordering.

I'm still a little unsure how i'd go about this versioning when I just have
updates coming from a DB that know nothing about the version. Sounds like
i'd have to store some kind of state for each record on the service that is
doing the indexing right?

On Thursday, August 8, 2013 4:11:28 PM UTC-4, Jörg Prante wrote:

What kind of protection do you have in mind?

It is up to the client to create the versioning information for ordering
document actions and to take action if it fails.

Your updates are traveling independently through the cluster to the
location where they persist. Depending on the network speed and CPU load of
the nodes, the updates may arrive sooner or later at the shards. It's quite
common that some nodes are heavily busy while others are not.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If you can timestamp your data, you can easily bring in your timestamp for
versioning, or another incremental counter. It's called "external"
versioning. More details are explained here

Quote:


With version_type set to external, Elasticsearch will store the version
number as given and will not increment it. Also, instead of checking for an
exact match, Elasticsearch will only return a version collision error if
the version currently stored is greater or equal to the one in the indexing
command. This effectively means “only store this information if no one else
has supplied the same or a more recent version in the meantime”.

Cluster nodes in ES are not synchronized, they do not communicate about the
age of the actions to be performed. Each node is working alone for itself -
this optimistic (or byzantine, whatever you like) behavior is one secret of
ES extraordinary scalability and performance. No distributed transactions,
no waits. If something fails, a recovery (and conflict resolution) is
required.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.