My application needs to share some state across several ec2 instances
(cache, who is master etc). I'm looking at setting up Hazelcast for this,
but since each instance of the application is already an elasticsearch
node, perhaps there is a way to leverage that instead?
My application needs to share some state across several ec2
instances
(cache, who is master etc). I'm looking at setting up Hazelcast
for this,
but since each instance of the application is already an
elasticsearch
node, perhaps there is a way to leverage that instead?
The cache would be tough to piggy-back on. You could grab the
master node from cluster state and use it for a custom purpose
though.
Why not store the data you're looking to share in ES proper? It's
already a highly optimized key-value store, among other things.
The cache would be tough to piggy-back on. You could grab the master node
from cluster state and use it for a custom purpose though.
Yes, but I just realized that knowing which node is the elasticsearch
master doesn't help when what you want is to ensure that a piece of
code in your application is run on one machine only. I'd need to do
something like "give me the most senior of all nodes tagged with x",
which is probably beyond the scope of the elasticsearch discovery
mechanism.
Why not store the data you're looking to share in ES proper? It's already a
highly optimized key-value store, among other things.
For example, I need to keep track of request counts to enforce quotas;
here I wouldn't just have to read, but also update a document for
every request! Hazelcast's distributed map works well for this kind of
thing, but Hazelcast has its own discovery mechanism that duplicates
what elasticsearch is already doing, possibly with its own set of
issues.
I'd need to do something like "give me the most senior of all
nodes tagged with x", which is probably beyond the scope of the
elasticsearch discovery mechanism.
You can get start_time from the nodes info API and calculate
seniority, if age is what you mean by "seniority."
Why not store the data you're looking to share in ES proper?
It's already a highly optimized key-value store, among other
things.
For example, I need to keep track of request counts to enforce
quotas; here I wouldn't just have to read, but also update a
document for every request! Hazelcast's distributed map works
well for this kind of thing, but Hazelcast has its own discovery
mechanism that duplicates what elasticsearch is already doing,
possibly with its own set of issues.
You might be surprised how efficient ES can be here, especially if
you use the update API. How much traffic are you trying to
support? Should be fairly easy (he says with no knowledge of your
application :)). I'd try it before assimilating a whole new
service to maintain.
The only stumbling block might be if you have to support such a
high volume of data that you have to keep a lot of spare disk
capacity on hand, due to the yet-to-be-deleted docs generated from
many updates. Since you're looking at an in-memory store in
comparison I doubt this is the case, however.
You can get start_time from the nodes info API and calculate seniority, if
age is what you mean by "seniority."
That could work; I just need a mechanism to ensure that some code is
run on one application instance only.
You might be surprised how efficient ES can be here, especially if you use
the update API. How much traffic are you trying to support? Should be
fairly easy (he says with no knowledge of your application :)). I'd try it
before assimilating a whole new service to maintain.
I'll give it a try...
Or how about this approach?
for each operation, add a document with two fields ("user", "cost")
to a "quotas" index
set the TTL for the document to e.g. 31 days
to check if a user is over quota, use the statistical facet to sum
up the "cost" field for a "user"
just a quick side note:
If you want to make sure, that a piece of code is only once in your
cluster, you could simply use a river for that. Elasticsearch takes care of
the rest then.
If you want to deep dive a bit into elasticsearch and send data yourself to
nodes across your cluster, take a look at TransportService.sendRequest().
You can get start_time from the nodes info API and calculate seniority,
if
age is what you mean by "seniority."
That could work; I just need a mechanism to ensure that some code is
run on one application instance only.
You might be surprised how efficient ES can be here, especially if you
use
the update API. How much traffic are you trying to support? Should be
fairly easy (he says with no knowledge of your application :)). I'd try
it
before assimilating a whole new service to maintain.
I'll give it a try...
Or how about this approach?
for each operation, add a document with two fields ("user", "cost")
to a "quotas" index
set the TTL for the document to e.g. 31 days
to check if a user is over quota, use the statistical facet to sum
up the "cost" field for a "user"
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.