transport module is the module elastic search uses for moving shards around
in the cluster.
can it be used somehow to move index data between different clusters? the
point here is avoid the whole scanning in source / indexing in destiny
thing, which is essentially the solution all the
moving-data-between-clusters implementations I've seen are based on.
Now that I have your attention: this is my case.
have around 700 indexes. each one of around 7k records. relatively small.
the ES cluster does not work well with so many small indexes, it wastes
too much time deciding which node is master and which not
we need to separate indexing from searching
one solution is index in one machine and then transfer the index to the
search machine.
if we do it the standard way, it implies indexing the dump from the
index machine into the search machine, so no performance is gained.
One solution would be to move data between source and destiny the same
way ES moves data inside a cluster, which I bet is much more efficient than
the dumping/reindex approax
Sorry, but I don't quite understand what you're saying.
as I've read in the docs, "routing" is a per-document value elasticsearch
uses to determine which shard that document should be into. That way, if
two documents share the same routing value, we can be certain they'll be
allocated in the same shard. AFAIK, that's the only thing that can be taken
for sure. (If two documents have different routing values, it does not mean
the'll end up in different shards).
On the other hand, I can associate an alias with and index name and a
routing value. That means, for instance, that if I use an alias with a
routing value , I can be certain that search operation will go straight to
the same shards to where all documents with that routing value were indexed.
However, when I want to separate indexing from searching, the shards in the
"indexing zone" and in the "searching zone" should contain the same data. I
guess the primary shards being all located in the indexing zone and the
replica shards in the searching zone. I honestly can't see what the routing
values have to do with this. A routing value only determines which shard a
document is going to be located, and that will be the same whether it's a
primary or a replica. I mean, if I search for something in a three-sharded
index with 1 replica with the routing value "A" which corresponds to the
first shard, both the first of the primaries shards and the first of the
replica shards would comply with that condition. I see no way of
"directing" search request to only the replica shards and the "index"
request to only the main shards using routing and aliases.
Also, the doc in here
me to specify in which nodes I want certain indexes shards to be
located in, but, again, I can't see how I can use that to "separate" index
and searching.
That being said, I know there has to be a way.(In fact, in
talk about "indexing" and "searching" aliases), but I just can't see.
If anyone would enlighten me I would really appreciate that.
preferably, search clients should only connect to N2
now, how to perform more indexing rounds when A1 is in use on both nodes?
Start over again but with a second index:
create index A2 with shard routing attribute "index" and replica level 0
feed documents into index A2, connect only to N1 with the client
allow replica to disseminate to all nodes
set replica level 1 to index A2, ES is doing the shard copy automatically
now, modify index alias A regarding the new index. For example, for a
switchover, remove A1 and add A2. This is done atomically.
A1 and A2 can either be full updates or incremental. How this should be
reflected in search is managed by index aliasing with A. After a switchover
from A1 to A2, A1 can be dropped.
N1 and N2 can also be extended to a group of nodes for indexing and
searching.
Now I'm going to check again with the dev crew which was the reason we
couldn't do clusters when having so many indexes (around 1k), and see if
that can be solved with smart shard routing management. (I think it was
related to the fact that ES waste too much resources trying to determine
which node is responsible or master for which clients).
I think that system should work only for a two nodes setup. Otherwise, how
can we be sure that when letting elastisearch put a replica everywhere it
wants, there's going to be a copy in the nodes tagged as "search". Say you
have three "indexing" nodes and three "searching" nodes, and you go through
the process described by you. You can be certain that your index is in the
"indexing" zone, but when you set up replica=1, you are not sure where
elasticsearch is going to locate the replicas. Unless, of course, that the
fact that you're pointing the search to a particular node will "force"
elasticsearch to put a replica there.
On Thu, Feb 6, 2014 at 2:27 PM, José de Zárate jzarate@gmail.com wrote:
Jörg
that's brilliant indeed.
Now I'm going to check again with the dev crew which was the reason we
couldn't do clusters when having so many indexes (around 1k), and see if
that can be solved with smart shard routing management. (I think it was
related to the fact that ES waste too much resources trying to determine
which node is responsible or master for which clients).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.