Two different shard exceptions

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
    nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
    IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
    duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
    __2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppearcy@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
    nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
    IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
    duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
    __2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
    nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
    IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
    duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
    __2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
    nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
    IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
    duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
    __2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

The REST interface uses the Java Client to do the operations, so I don't
think its related. I will go over the exceptions and see that at least they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppearcy@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
    reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
    nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
    IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
    duplicate key: __2tf]; nested: IllegalArgumentException[duplicate
    key:
    __2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I don't
think its related. I will go over the exceptions and see that at least they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
    reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
    nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
    IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
    duplicate key: __2tf]; nested: IllegalArgumentException[duplicate
    key:
    __2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I don't
think its related. I will go over the exceptions and see that at least they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
    reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
    nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
    IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
    duplicate key: __2tf]; nested: IllegalArgumentException[duplicate
    key:
    __2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

Hi Paul,

Yea, that exception helps a lot, though very very very strange... . This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this, but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppearcy@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that
I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874)
    +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf
]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the
working
time, it went to a good server). To confirm, I shutdown the
good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving
    this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
    reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0]
    ];
    nested:
    RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye
out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch dm-adsearchd103(rw,async,no_root_squash)

  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)

  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange... . This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this, but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that
I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874)
    +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf
]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the
working
time, it went to a good server). To confirm, I shutdown the
good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving
    this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
    reason:
    BroadcastShardOperationFailedException[[djnf_20100917150037][0]
    ];
    nested:
    RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
    10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye
out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

Hey,

A few things:

  1. I suggest you use sync mode and not async with NFS. As writing is done in
    the background, it does not have any performance implications. In any case I
    fsync all the files, not sure if it overrides the async mode of NFS or
    not... .

  2. The java version is pretty old. openjdk lags behind the sun jdk when it
    comes to new versions (I think in ubuntu its at b18, where a major memory
    leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppearcy@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch dm-adsearchd103(rw,async,no_root_squash)

  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)

  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange... .
This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this, but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl.
Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I
see
either of these again, will enable more detailed logging and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10
that
I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here is
    the
    exception I was getting:
    RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
    10.2.20.160:9301]][search/phase/query/id]]; nested:
    QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
    query[filtered(+(+feedid:753 +wsodissue:44874)
    +__documentdate:[* TO
    1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the
working
time, it went to a good server). To confirm, I shutdown the
good node
and it would fail every time. I then brought up the good
node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was
able to
resolve this by clearing the work directory on the bad
node.

  1. Snapshot error. I have snapshot interval disabled and am
    snapshotting based on content received. I started receiving
    this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037, shard:
    0,
    reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an
eye
out for
them again, but wanted to give a heads up, as they seem
like
potential
issues.

Thanks,
Paul

Also, wondering out load here, but if you move to master, you might consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hey,

A few things:

  1. I suggest you use sync mode and not async with NFS. As writing is done
    in the background, it does not have any performance implications. In any
    case I fsync all the files, not sure if it overrides the async mode of NFS
    or not... .

  2. The java version is pretty old. openjdk lags behind the sun jdk when it
    comes to new versions (I think in ubuntu its at b18, where a major memory
    leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppearcy@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch dm-adsearchd103(rw,async,no_root_squash)

  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)

  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange... .
This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0 and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl.
Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If
I see
either of these again, will enable more detailed logging and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10
that
I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here
    is the
    exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as
the
working
time, it went to a good server). To confirm, I shutdown
the
good node
and it would fail every time. I then brought up the good
node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was
able to
resolve this by clearing the work directory on the bad
node.

  1. Snapshot error. I have snapshot interval disabled and
    am
    snapshotting based on content received. I started
    receiving
    this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037,
    shard: 0,
    reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an
eye
out for
them again, but wanted to give a heads up, as they seem
like
potential
issues.

Thanks,
Paul

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon shay.ba...@elasticsearch.comwrote:

Hey,

A few things:

  1. I suggest you use sync mode and not async with NFS. As writing is done
    in the background, it does not have any performance implications. In any
    case I fsync all the files, not sure if it overrides the async mode of NFS
    or not... .
  1. The java version is pretty old. openjdk lags behind the sun jdk when it
    comes to new versions (I think in ubuntu its at b18, where a major memory
    leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch dm-adsearchd103(rw,async,no_root_squash)
  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)
  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange... .
This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0 and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl.
Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If
I see
either of these again, will enable more detailed logging and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10
that
I
haven't seen before. Running a 2 node mirrored cluster.

  1. Searching a certain shard on certain node fails. Here
    is the
    exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as
the
working
time, it went to a good server). To confirm, I shutdown
the
good node
and it would fail every time. I then brought up the good
node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was
able to
resolve this by clearing the work directory on the bad
node.

  1. Snapshot error. I have snapshot interval disabled and
    am
    snapshotting based on content received. I started
    receiving
    this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037,
    shard: 0,
    reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an
eye
out for
them again, but wanted to give a heads up, as they seem
like
potential
issues.

Thanks,
Paul

Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppearcy@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

  1. I suggest you use sync mode and not async with NFS. As writing is
    done
    in the background, it does not have any performance implications. In
    any
    case I fsync all the files, not sure if it overrides the async mode of
    NFS
    or not... .
  1. The java version is pretty old. openjdk lags behind the sun jdk when
    it
    comes to new versions (I think in ubuntu its at b18, where a major
    memory
    leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch dm-adsearchd103(rw,async,no_root_squash)
  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)
  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange...
.
This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations,
so I
don't
think its related. I will go over the exceptions and see that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the Java
Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have.
If
I see
either of these again, will enable more detailed logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

  1. Searching a certain shard on certain node fails.
    Here
    is the
    exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query. I
was
able to
resolve this by clearing the work directory on the bad
node.

  1. Snapshot error. I have snapshot interval disabled
    and
    am
    snapshotting based on content received. I started
    receiving
    this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037,
    shard: 0,
    reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep
an
eye
out for
them again, but wanted to give a heads up, as they
seem
like
potential
issues.

Thanks,
Paul

Previously, the issue only effected one server (probably the non-
master for the shard, which is why it didn't go to the gateway).

This time around, whatever went bad in the index got persisted to the
gateway, causing all queries against that index to fail.

Thanks,
Paul

On Sep 24, 1:02 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

  1. I suggest you use sync mode and not async with NFS. As writing is
    done
    in the background, it does not have any performance implications. In
    any
    case I fsync all the files, not sure if it overrides the async mode of
    NFS
    or not... .
  1. The java version is pretty old. openjdk lags behind the sun jdk when
    it
    comes to new versions (I think in ubuntu its at b18, where a major
    memory
    leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch dm-adsearchd103(rw,async,no_root_squash)
  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)
  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange...
.
This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations,
so I
don't
think its related. I will go over the exceptions and see that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the Java
Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have.
If
I see
either of these again, will enable more detailed logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

  1. Searching a certain shard on certain node fails.
    Here
    is the
    exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query. I
was
able to
resolve this by clearing the work directory on the bad
node.

  1. Snapshot error. I have snapshot interval disabled
    and
    am
    snapshotting based on content received. I started
    receiving
    this
    exception:
    ERROR > Shapshot failed, index: djnf_20100917150037,
    shard: 0,
    reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep
an
eye
out for
them again, but wanted to give a heads up, as they
seem
like
potential
issues.

Thanks,
Paul

The fact that it even got solved when deleting the work dir and recovering
from the gateway is strange. Is there a chance that you can change that NFS
mount from async to sync?

On Fri, Sep 24, 2010 at 10:39 PM, Paul ppearcy@gmail.com wrote:

Previously, the issue only effected one server (probably the non-
master for the shard, which is why it didn't go to the gateway).

This time around, whatever went bad in the index got persisted to the
gateway, causing all queries against that index to fail.

Thanks,
Paul

On Sep 24, 1:02 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at
all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

  1. I suggest you use sync mode and not async with NFS. As writing
    is
    done
    in the background, it does not have any performance implications.
    In
    any
    case I fsync all the files, not sure if it overrides the async mode
    of
    NFS
    or not... .
  1. The java version is pretty old. openjdk lags behind the sun jdk
    when
    it
    comes to new versions (I think in ubuntu its at b18, where a major
    memory
    leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find
any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch
    dm-adsearchd103(rw,async,no_root_squash)
  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)
  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of
the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very
strange...
.
This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and
then
build
an immutable map from them. The strange thing is that it
complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com
wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of
more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a
few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on
0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

The REST interface uses the Java Client to do the
operations,
so I
don't
think its related. I will go over the exceptions and see
that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the
Java
Node
client. Are there any retries available via that
interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I
have.
If
I see
either of these again, will enable more detailed
logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend
on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

  1. Searching a certain shard on certain node
    fails.
    Here
    is the
    exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]];
nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]];
nested:

The search is valid and would work every other
time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up
the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query.
I
was
able to
resolve this by clearing the work directory on the
bad
node.

  1. Snapshot error. I have snapshot interval
    disabled
    and
    am
    snapshotting based on content received. I started
    receiving
    this
    exception:
    ERROR > Shapshot failed, index:
    djnf_20100917150037,
    shard: 0,
    reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300
]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will
keep
an
eye
out for
them again, but wanted to give a heads up, as they
seem
like
potential
issues.

Thanks,
Paul

Yep, will do.

On Sep 24, 2:56 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The fact that it even got solved when deleting the work dir and recovering
from the gateway is strange. Is there a chance that you can change that NFS
mount from async to sync?

On Fri, Sep 24, 2010 at 10:39 PM, Paul ppea...@gmail.com wrote:

Previously, the issue only effected one server (probably the non-
master for the shard, which is why it didn't go to the gateway).

This time around, whatever went bad in the index got persisted to the
gateway, causing all queries against that index to fail.

Thanks,
Paul

On Sep 24, 1:02 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at
all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

  1. I suggest you use sync mode and not async with NFS. As writing
    is
    done
    in the background, it does not have any performance implications.
    In
    any
    case I fsync all the files, not sure if it overrides the async mode
    of
    NFS
    or not... .
  1. The java version is pretty old. openjdk lags behind the sun jdk
    when
    it
    comes to new versions (I think in ubuntu its at b18, where a major
    memory
    leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find
any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

  • Using NFS based gateway, exported such as:
    /share/adsearch
    dm-adsearchd103(rw,async,no_root_squash)
  • Using this version of CentOS (not my choice):
    Tikanga
    CentOS release 5.5 (Final)
  • Running this version of java:
    java version "1.6.0"
    OpenJDK Runtime Environment (build 1.6.0-b09)
    OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of
the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very
strange...
.
This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and
then
build
an immutable map from them. The strange thing is that it
complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com
wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of
more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a
few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on
0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

The REST interface uses the Java Client to do the
operations,
so I
don't
think its related. I will go over the exceptions and see
that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the
Java
Node
client. Are there any retries available via that
interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I
have.
If
I see
either of these again, will enable more detailed
logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend
on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

  1. Searching a certain shard on certain node
    fails.
    Here
    is the
    exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]];
nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]];
nested:

The search is valid and would work every other
time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up
the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query.
I
was
able to
resolve this by clearing the work directory on the
bad
node.

  1. Snapshot error. I have snapshot interval
    disabled
    and
    am
    snapshotting based on content received. I started
    receiving
    this
    exception:
    ERROR > Shapshot failed, index:
    djnf_20100917150037,
    shard: 0,
    reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:

...

read more »