OK - lets say I create 100000 json documents from some data stored in an
SQL database. The documents are in no way grouped so this could either be
an entire batch of a single users data, 1 document for 100000 different
users or any combination in between. There's no way of knowing. When I
insert the 100000 documents I want to make sure that each users data is
routed correctly - so a users data will go to 1 shard using a routing based
on their id. Every batch of 100000 documents will be as different as the
last.
I understand that if I insert each document separately I can determine what
the user id is and I can use an alias
which contains a route based on the id. However, I want to use bulk
insert. So what I was looking for was how I might perform the routing (or
aliasing) without needing to group my bulk insertions together by user-id
or something.
In the ES API docs it says you can add the _routing field per bulk item.
So I guess the answer is to do a bit of processing on the client to make
sure that for each bulk item, the _routing field is set correctly. I was
having trouble understanding how to do this because it's currently
unsupported by my .Net client. Also, the bulk API says nothing about
aliases - I guess these should be created after a bulk backfill or on the
fly as new documents are added.
Cheers!
James
On Tue, Feb 5, 2013 at 1:15 PM, Clinton Gormley clint@traveljury.comwrote:
On Tue, 2013-02-05 at 13:05 +0000, James Lewis wrote:
Hey cheers for the reply, I think what I'm having trouble with at the
moment is how to use the correct routing when performing a bulk insert
- so the batch will have 100,000 records in it all from different
users. Ultimately, I want to use routing so that 1 users documents
are all on the same shard ( a la the user data flow discussed in that
video you referenced).
Any ideas if that's possible?
Yes, that's possible. Either by specifying the routing manually or by
using aliases with routing built in (as explained in that video).
clint
Regards,
James
On Tue, Feb 5, 2013 at 1:02 PM, Clinton Gormley clint@traveljury.com
wrote:
Hi James
> I need to do a bulk insert into a cluster but I want to use
aliases.
> Is there any way?
Yes, you can use an alias instead of the index name. But for
indexing
purposes, your alias should only point to one index. For
searching, you
can use a different alias which points to multiple indices.
> I'll be using aliases to route all users data to the same
shared when
> inserting - I think that's the most important thing to be
able to do
> when I do a bulk insert. I noticed there's a routing option
on the
> bulk insert, would this be the way to go? Would you just
perform your
> bulk insert creating a route out of say the user-id and then
elsewhere
> in your application where you do a single insert / search
use an alias
> that routed to the user-id and filtered on the user-id?
You can do that, or you can specify a routing and a filter
when you
create an alias, then you don't need to worry about having to
specify it
each time.
Have a look at
Elasticsearch Platform — Find real-time answers at scale | Elastic
and
Elasticsearch Platform — Find real-time answers at scale | Elastic
clint
--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearch
+unsubscribe@googlegroups.com.
For more options, visit
https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.