Hi
We are in the startup-phase of a projekt where we consider using ES. I
have a few questions that I would please like someone to answer before
we can proceed.
A little bit about our needs:
We need to store 18billion+ records consisting of stuff like numbers,
timestamps and text. We need to be able to do searches including
prefix-search on the numbers (e.g. 123* should find 12345678),
range-search on timestamps and full-text search on texts. Our system
will be very write-extensive (about 50million+ new records will need to
be stored/indexed every day and 50million+ old records will need to be
deleted). It will not be very read-extensive (only a few queries a day
is expected).
Questions:
-
First of all in general we will be happy to receive any input you will
find relevant for us. -
In general, do you believe ES will fit our needs - especially with
respect to the number of records mentioned above and the read/write
balance. -
Are there some way of controlling which data goes to what shard - e.g.
that all records where number-A is between 12340000 and 12349999 always
goes on one specific shard. I believe I have heard somewhere from one of
the ES guys that it is possible to make all records with a certain
timestamp range go into the same shard, and that it will be smart to do
if you want to delete "old" records based on timestamp (it should be
easier to delete an entire shard that deleting only selected records
inside of it), so I guess it is possible. Since the 50million+ new
records we will receive every day will probably all have timestamps from
the same day, I believe we will certainly not want to configure it so
that certain timestamp-ranges go into certain shards, because that will
make a bottleneck on those few shards (all 50million+ records of a day
will have to be inserted into very few shards). Since we are so
write-extensive it is important that we are able to utilize all power in
the cluster for writing, and that we are able to scale with respect to
writing capabilility by adding more nodes/shards. Any comments on that
thought? Am I right, or did I miss something? -
Do you have a rule-of-thumb about the maximum size of a shard, or put
in another way how many shards will we need to be able to store the
18billion+ records? I guess that number will basically be dependent on
limitations in Lucene, where I believe I have heard that you shouldnt go
beyond 10-100million records. Will 100 shards be enough for us, is less
enough or do we need to consider 1000? -
Do ES have any kind of rebalancing mechanism? E.g. if a nodes join
along the way and we dynamically increase the number of shards, will it
be able to "even out" the data already stored, or will it only have
influence on data added after that point in time where we add
nodes/shards. Guess this will be hard if we have put any kind of
restrictions on which shard a certain range of values need to go to (see
above). -
Reading about persistence in ES I have a hard time figuring out
exactly how it works - node-local storage vs gateway-storage. Do you
have a pointer to a thorough description on how persistence work? I
general I want to make sure that data will be persisted-persisted, when
an inserting/indexing-process has done a number of inserts (maybe bulk
inserts) and it finishes. It isnt allowed to be possible that a
inserting-process thinks that is has inserted a number of records but
that they are really not persisted-persisted yet. With
persisted-persisted I mean that no data will disappear even though all
nodes in the cluster will stop (e.g. due to global power-outage) a
split-sec after the process finished, or even though any single disk
will crash a split-sec after the process finished. So
persisted-persisted means stored on disk (will survive shutdown of
machine) - actually stored on at least two disks (redundant). I believe
I heard one of the ES guys saying something about records/indexes not
being persisted-persisted unless IndexWriter.commit (or something) of
the Lucene underneath has been called. If that is true I guess I need a
way through ES to make sure that this has happened. I also heard that
this operation is expensive, and that it should therefore not be done
too often. I need to make sure that it has been done when my
insert/index-process finishes (call the operation as the last thing in
the process), but if it is expensive I guess I need to make sure that my
insert/index-processes are not too small with respect to the number of
records that they insert. Any comments on that? What will a practical
lower limit on number of inserts/indexes that have to be done between
IndexWriter.commits? -
How to do updates to a record? Do I need to find the existing record
(e.g. by id), delete the existing record and insert/index a new record
with the combined information from the old record and the new
information I have to add to it? Or are there any other way updating
records? What about transaction isolation when doing this - if two
processes are updating an existing record "at the same time" will I be
sure that one of them will fail and that the other one will succeed? -
We will have to run a lot of concurrent insert/index-processes. Any
comments about that? Strategies on how to do that the best way? -
Auto establishment of redundancy? Will the ES cluster automatically
notice when an "instance" of a shard (primary or secondary) goes down,
and automatically start establishing a new "instance" of that shard from
the one that did not go down, in order to reestablish redundancy of that
shard? -
Any comments on ACID transactions? I would be nice if I can have all
the ACID-properties in a transaction spanning the entire work of a
insert/index-process - all or nothing is inserted/indexed, it is
consistent with respect to what other processes might do concurrently,
isolation-levels? and durability (guess I have touched that already).
Will ES be able to participate in distributed XA transactions with
two-phase-commit etc. -
The numbers mentioned in the beginning (18billion all in all,
50million in/out per day) are only first-attempt-requirements. Basically
we need to scale to "infinitly" (very far) on those numbers. Are there
any known limitations on how far a ES cluster can scale - that is, is
there a limit where I cannot just buy more hardware to support higher
numbers?
Regards, Per Steffensen