1.2.0 routing issue and explicit routing

Hi,

I use explicit routing in some of my logging indexes, and I have a lot of
data. The tool provided to fix the routing issue that showed up in 1.2.0
takes forever on my indexes, but while it has been running, I'm wondering
if I even have the problem sketched by the (very informative) blog item. I
don't seem to have lost access to events, and am now getting reports of
duplicate events.

So, to make this really clear:

  • Does using explicit routing (in this cased defined in mapping to use a
    customer id field) have the same routing issue?
  • Does the tool take explicit routing into account, or am I now messing
    things up royally?
  • If so, can I remove those duplicates, while keeping my explicit routing?
  • All this might be useful to mention in an update to the blog post.

I've shutdown the tool for the time being.

--
Cheers,

ralphm

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27ccbc63-8fb6-4020-9b41-1ad8855426d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

As an aside, I am also wondering why this link
http://www.elasticsearch.org/downloads/1-2-0/ is still active and
available when it was supposed to be pulled.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1a0a4bd7-2149-482a-b502-b13a830ca7b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Probably because it contains the release notes etc.
You can't download any of the files from the links, though a note about it
being removed would be handy I guess.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 June 2014 08:14, Brian brian.from.fl@gmail.com wrote:

As an aside, I am also wondering why this link
http://www.elasticsearch.org/downloads/1-2-0/ is still active and
available when it was supposed to be pulled.

Brian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1a0a4bd7-2149-482a-b502-b13a830ca7b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1a0a4bd7-2149-482a-b502-b13a830ca7b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Yg6DDWjTvpMr94hOocm0SPjs1RdSVFrsH%2Bo7UKWE7_gA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On Tue, Jun 10, 2014 at 11:44 PM, Ralph Meijer ralphm@ik.nu wrote:

  • Does using explicit routing (in this cased defined in mapping to use a
    customer id field) have the same routing issue?

Yes. The issue is on the hashing function, so it occurs when using explicit
routing as well.

  • Does the tool take explicit routing into account, or am I now messing
    things up royally?

Yes it does. When you use custom routing, either explicitely by setting the
routing value in the mappings or indexing requests, or implicitely in case
of parent/child, Elasticsearch stores a special _routing value that the
tool uses in order to identify the right shard for this document.

  • If so, can I remove those duplicates, while keeping my explicit routing?

Yes: the tool has a delete action to remove documents that have been
routed to the wrong shard.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6faxqZkLH7ghO7%3DkkWXgWp7ZXPQKG9xVTtTV-n1zcmoA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On 2014-06-11 10:50, Adrien Grand wrote:

On Tue, Jun 10, 2014 at 11:44 PM, Ralph Meijer ralphm@ik.nu wrote:

  • Does using explicit routing (in this cased defined in mapping to use a
    customer id field) have the same routing issue?

Yes. The issue is on the hashing function, so it occurs when using explicit
routing as well.
[..]

Ok, thanks!

I've been running the tool, but looking at my graphs, it looks like its
copy_if_missing function processes about 8 documents a second, which
seems really slow compared to the index rates I normally have. Because
this is just one daily index with over a 100 million entries, completing
the sequence is going to take forever. Is there a way to speed this up?

If not, I'll probably stop the process and reindex using the _source
field, taking the copies for granted.

--
ralphm

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53994E4F.1090704%40ik.nu.
For more options, visit https://groups.google.com/d/optout.

How many of these 100M documents have been indexed with Elasticsearch
1.2.0? The tool would only send indexing requests for documents that have
been misrouted, so if you only see 8 indexing requests per second, which is
indeed low, that might mean that the bottleneck is searching for the
mis-routed documents.

On Thu, Jun 12, 2014 at 8:53 AM, Ralph Meijer ralphm@ik.nu wrote:

On 2014-06-11 10:50, Adrien Grand wrote:

On Tue, Jun 10, 2014 at 11:44 PM, Ralph Meijer ralphm@ik.nu wrote:

  • Does using explicit routing (in this cased defined in mapping to use
    a
    customer id field) have the same routing issue?

Yes. The issue is on the hashing function, so it occurs when using
explicit
routing as well.
[..]

Ok, thanks!

I've been running the tool, but looking at my graphs, it looks like its
copy_if_missing function processes about 8 documents a second, which
seems really slow compared to the index rates I normally have. Because
this is just one daily index with over a 100 million entries, completing
the sequence is going to take forever. Is there a way to speed this up?

If not, I'll probably stop the process and reindex using the _source
field, taking the copies for granted.

--
ralphm

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/53994E4F.1090704%40ik.nu.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j75UgRQPt4OqTFX0jFY85kNThVbxo5x_Ka9bFNc_zNaaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On 2014-06-12 09:23, Adrien Grand wrote:

How many of these 100M documents have been indexed with Elasticsearch
1.2.0?

All of them, and I have like 40 of them, with their size a similar order
of magnitude :-/

The tool would only send indexing requests for documents that have
been misrouted, so if you only see 8 indexing requests per second, which is
indeed low, that might mean that the bottleneck is searching for the
mis-routed documents.

I couldn't find the source of the tool on GitHub, but maybe if I had
some insight into how that search works, I could change some settings.
Ideas welcome, of course. If it helps, I'm in elasticsearch as ralphm.

--
ralphm

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53995769.5010500%40ik.nu.
For more options, visit https://groups.google.com/d/optout.

On Thu, Jun 12, 2014 at 9:31 AM, Ralph Meijer ralphm@ik.nu wrote:

On 2014-06-12 09:23, Adrien Grand wrote:

How many of these 100M documents have been indexed with Elasticsearch
1.2.0?

All of them, and I have like 40 of them, with their size a similar order
of magnitude :-/

If all of them were indexed in 1.2.0 then it makes sense to just reindex
from _source as you suggested.

The tool would only send indexing requests for documents that have
been misrouted, so if you only see 8 indexing requests per second, which
is
indeed low, that might mean that the bottleneck is searching for the
mis-routed documents.

I couldn't find the source of the tool on GitHub, but maybe if I had
some insight into how that search works, I could change some settings.
Ideas welcome, of course. If it helps, I'm in elasticsearch as ralphm.

It does a SCAN search that filters using a script filter (the one you added
to your config/scripts). And for each matching documents, it reindexes it
to the right shard, either by requiring a create operation
(copy_if_missing) or overriding a potential document that would have the
same _id (copy_overwrite).

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j585BOmxdcjmiQV9vsPPfhiOd7z2wxL3Je5gcTRX5NVfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On 2014-06-12 10:33, Adrien Grand wrote:

On Thu, Jun 12, 2014 at 9:31 AM, Ralph Meijer ralphm@ik.nu wrote:

On 2014-06-12 09:23, Adrien Grand wrote:

How many of these 100M documents have been indexed with Elasticsearch
1.2.0?

All of them, and I have like 40 of them, with their size a similar order
of magnitude :-/

If all of them were indexed in 1.2.0 then it makes sense to just reindex
from _source as you suggested.

Ok, thanks!

--
ralphm

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53996837.3080709%40ik.nu.
For more options, visit https://groups.google.com/d/optout.