Two different shard exceptions

ppearcy · September 20, 2010, 11:45pm

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

kimchy · September 20, 2010, 11:48pm

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppearcy@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

ppearcy · September 21, 2010, 12:09am

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

ppearcy · September 21, 2010, 4:59am

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":
org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0, reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
duplicate key: __2tf]; nested: IllegalArgumentException[duplicate key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like potential
issues.

Thanks,
Paul

kimchy · September 21, 2010, 8:33am

The REST interface uses the Java Client to do the operations, so I don't
think its related. I will go over the exceptions and see that at least they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppearcy@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
duplicate key: __2tf]; nested: IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

ppearcy · September 23, 2010, 4:36pm

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I don't
think its related. I will go over the exceptions and see that at least they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
duplicate key: __2tf]; nested: IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

ppearcy · September 23, 2010, 5:06pm

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist.github.com

https://gist.github.com/ppearcy/593991

gistfile1.txt

[10:43:16,841][WARN ][index.gateway            ] [dm-adsearchd103.dev.local] [wachovia_20100917150048][0] failed to snapshot on close
org.elasticsearch.index.gateway.IndexShardGatewaySnapshotFailedException: [wachovia_20100917150048][0] duplicate key: __2ug
        at org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshot(BlobStoreIndexShardGateway.java:152)
        at org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:232)
        at org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:227)
        at org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:426)
        at org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:372)
        at org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:227)
        at org.elasticsearch.index.gateway.IndexShardGatewayService.snapshotOnClose(IndexShardGatewayService.java:272)
        at org.elasticsearch.index.service.InternalIndexService.deleteShard(InternalIndexService.java:338)

This file has been truncated. show original

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I don't
think its related. I will go over the exceptions and see that at least they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874) +__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +
(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf>]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the working
time, it went to a good server). To confirm, I shutdown the good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0] ];
nested: RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:
IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]
duplicate key: __2tf]; nested: IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

kimchy · September 23, 2010, 5:11pm

Hi Paul,

Yea, that exception helps a lot, though very very very strange... . This
is where its coming from:

    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =

ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this, but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppearcy@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that
I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf
]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the
working
time, it went to a good server). To confirm, I shutdown the
good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving
this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0]
];
nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye
out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

ppearcy · September 23, 2010, 5:41pm

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch dm-adsearchd103(rw,async,no_root_squash)
Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)
Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange... . This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this, but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl. Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I see
either of these again, will enable more detailed logging and see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10 that
I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":
org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf
]:
Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the
working
time, it went to a good server). To confirm, I shutdown the
good node
and it would fail every time. I then brought up the good node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was able to
resolve this by clearing the work directory on the bad node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving
this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard: 0,
reason:
BroadcastShardOperationFailedException[[djnf_20100917150037][0]
];
nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]]; nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an eye
out for
them again, but wanted to give a heads up, as they seem like
potential
issues.

Thanks,
Paul

kimchy · September 23, 2010, 5:49pm

Hey,

A few things:

I suggest you use sync mode and not async with NFS. As writing is done in
the background, it does not have any performance implications. In any case I
fsync all the files, not sure if it overrides the async mode of NFS or
not... .
The java version is pretty old. openjdk lags behind the sun jdk when it
comes to new versions (I think in ubuntu its at b18, where a major memory
leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppearcy@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch dm-adsearchd103(rw,async,no_root_squash)

Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)

Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:
Hi Paul,

Yea, that exception helps a lot, though very very very strange... .
This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this, but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the necessary
information to track this down next time around. I'm on 0.10.0 and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl.
Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If I
see
either of these again, will enable more detailed logging and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10
that
I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here is
the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/
10.2.20.160:9301]][search/phase/query/id]]; nested:
QueryPhaseExecutionException[[newsmedia_20100917150044][0]:
query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as the
working
time, it went to a good server). To confirm, I shutdown the
good node
and it would fail every time. I then brought up the good
node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was
able to
resolve this by clearing the work directory on the bad
node.

Snapshot error. I have snapshot interval disabled and am
snapshotting based on content received. I started receiving
this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037, shard:
0,
reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an
eye
out for
them again, but wanted to give a heads up, as they seem
like
potential
issues.

Thanks,
Paul

kimchy · September 23, 2010, 5:50pm

Also, wondering out load here, but if you move to master, you might consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hey,

A few things:

I suggest you use sync mode and not async with NFS. As writing is done
in the background, it does not have any performance implications. In any
case I fsync all the files, not sure if it overrides the async mode of NFS
or not... .

The java version is pretty old. openjdk lags behind the sun jdk when it
comes to new versions (I think in ubuntu its at b18, where a major memory
leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppearcy@gmail.com wrote:
Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch dm-adsearchd103(rw,async,no_root_squash)

Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)

Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:
Hi Paul,

Yea, that exception helps a lot, though very very very strange... .
This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();

Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0 and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl.
Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If
I see
either of these again, will enable more detailed logging and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10
that
I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here
is the
exception I was getting:
RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as
the
working
time, it went to a good server). To confirm, I shutdown
the
good node
and it would fail every time. I then brought up the good
node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was
able to
resolve this by clearing the work directory on the bad
node.

Snapshot error. I have snapshot interval disabled and
am
snapshotting based on content received. I started
receiving
this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037,
shard: 0,
reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an
eye
out for
them again, but wanted to give a heads up, as they seem
like
potential
issues.

Thanks,
Paul

ppearcy · September 24, 2010, 5:41pm

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon shay.ba...@elasticsearch.comwrote:

Hey,

A few things:

I suggest you use sync mode and not async with NFS. As writing is done
in the background, it does not have any performance implications. In any
case I fsync all the files, not sure if it overrides the async mode of NFS
or not... .

The java version is pretty old. openjdk lags behind the sun jdk when it
comes to new versions (I think in ubuntu its at b18, where a major memory
leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch dm-adsearchd103(rw,async,no_root_squash)

Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)

Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange... .
This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();
Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains that
basically the listFiles returned duplicate File... . I will fix this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which cleared
the issue), so hopefully will have more data next time around. Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0 and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations, so I
don't
think its related. I will go over the exceptions and see that at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com wrote:

Btw, I was unable to reproduce the search exception via curl.
Does
the
rest interface have internal retries? I am using the Java Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have. If
I see
either of these again, will enable more detailed logging and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Both are strange. Are there by any chance more detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul ppea...@gmail.com
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on 0.10
that
I
haven't seen before. Running a 2 node mirrored cluster.

Searching a certain shard on certain node fails. Here
is the
exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time (as
the
working
time, it went to a good server). To confirm, I shutdown
the
good node
and it would fail every time. I then brought up the good
node,
shutdown the bad one and it would work every time. After
bringing the
bad node back up, it was still failing the query. I was
able to
resolve this by clearing the work directory on the bad
node.

Snapshot error. I have snapshot interval disabled and
am
snapshotting based on content received. I started
receiving
this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037,
shard: 0,
reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep an
eye
out for
them again, but wanted to give a heads up, as they seem
like
potential
issues.

Thanks,
Paul

kimchy · September 24, 2010, 7:02pm

Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppearcy@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:
Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

I suggest you use sync mode and not async with NFS. As writing is
done
in the background, it does not have any performance implications. In
any
case I fsync all the files, not sure if it overrides the async mode of
NFS
or not... .

The java version is pretty old. openjdk lags behind the sun jdk when
it
comes to new versions (I think in ubuntu its at b18, where a major
memory
leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch dm-adsearchd103(rw,async,no_root_squash)

Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)

Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange...
.
This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();
Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations,
so I
don't
think its related. I will go over the exceptions and see that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the Java
Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have.
If
I see
either of these again, will enable more detailed logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

Searching a certain shard on certain node fails.
Here
is the
exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query. I
was
able to
resolve this by clearing the work directory on the bad
node.

Snapshot error. I have snapshot interval disabled
and
am
snapshotting based on content received. I started
receiving
this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037,
shard: 0,
reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep
an
eye
out for
them again, but wanted to give a heads up, as they
seem
like
potential
issues.

Thanks,
Paul

ppearcy · September 24, 2010, 8:39pm

Previously, the issue only effected one server (probably the non-
master for the shard, which is why it didn't go to the gateway).

This time around, whatever went bad in the index got persisted to the
gateway, causing all queries against that index to fail.

Thanks,
Paul

On Sep 24, 1:02 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

I suggest you use sync mode and not async with NFS. As writing is
done
in the background, it does not have any performance implications. In
any
case I fsync all the files, not sure if it overrides the async mode of
NFS
or not... .

The java version is pretty old. openjdk lags behind the sun jdk when
it
comes to new versions (I think in ubuntu its at b18, where a major
memory
leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch dm-adsearchd103(rw,async,no_root_squash)

Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)

Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very strange...
.
This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();
Basically, as you can see, I like the files in a directory, and then
build
an immutable map from them. The strange thing is that it complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on 0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

The REST interface uses the Java Client to do the operations,
so I
don't
think its related. I will go over the exceptions and see that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the Java
Node
client. Are there any retries available via that interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I have.
If
I see
either of these again, will enable more detailed logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

Searching a certain shard on certain node fails.
Here
is the
exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]]; nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]]; nested:

The search is valid and would work every other time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query. I
was
able to
resolve this by clearing the work directory on the bad
node.

Snapshot error. I have snapshot interval disabled
and
am
snapshotting based on content received. I started
receiving
this
exception:
ERROR > Shapshot failed, index: djnf_20100917150037,
shard: 0,
reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will keep
an
eye
out for
them again, but wanted to give a heads up, as they
seem
like
potential
issues.

Thanks,
Paul

kimchy · September 24, 2010, 8:56pm

The fact that it even got solved when deleting the work dir and recovering
from the gateway is strange. Is there a chance that you can change that NFS
mount from async to sync?

On Fri, Sep 24, 2010 at 10:39 PM, Paul ppearcy@gmail.com wrote:

Previously, the issue only effected one server (probably the non-
master for the shard, which is why it didn't go to the gateway).

This time around, whatever went bad in the index got persisted to the
gateway, causing all queries against that index to fail.

Thanks,
Paul

On Sep 24, 1:02 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at
all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

I suggest you use sync mode and not async with NFS. As writing
is
done
in the background, it does not have any performance implications.
In
any
case I fsync all the files, not sure if it overrides the async mode
of
NFS
or not... .

The java version is pretty old. openjdk lags behind the sun jdk
when
it
comes to new versions (I think in ubuntu its at b18, where a major
memory
leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find
any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch
dm-adsearchd103(rw,async,no_root_squash)

Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)

Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of
the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very
strange...
.
This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();
Basically, as you can see, I like the files in a directory, and
then
build
an immutable map from them. The strange thing is that it
complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com
wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of
more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a
few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on
0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

The REST interface uses the Java Client to do the
operations,
so I
don't
think its related. I will go over the exceptions and see
that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the
Java
Node
client. Are there any retries available via that
interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I
have.
If
I see
either of these again, will enable more detailed
logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend
on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

Searching a certain shard on certain node
fails.
Here
is the
exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]];
nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia
+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]];
nested:

The search is valid and would work every other
time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up
the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query.
I
was
able to
resolve this by clearing the work directory on the
bad
node.

Snapshot error. I have snapshot interval
disabled
and
am
snapshotting based on content received. I started
receiving
this
exception:
ERROR > Shapshot failed, index:
djnf_20100917150037,
shard: 0,
reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:
RemoteTransportException[[dm-adsearchd103.dev.local][inet[/
10.2.20.164:9300
]][indices/gateway/snapshot/shard]];
nested:

IndexShardGatewaySnapshotFailedException[[djnf_20100917150037][0]

duplicate key: __2tf]; nested:
IllegalArgumentException[duplicate
key:
__2tf]; (Timer-0)

This was resolved by restarting the cluster.

I have only seen both these issues once and will
keep
an
eye
out for
them again, but wanted to give a heads up, as they
seem
like
potential
issues.

Thanks,
Paul

ppearcy · September 24, 2010, 9:03pm

Yep, will do.

On Sep 24, 2:56 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

The fact that it even got solved when deleting the work dir and recovering
from the gateway is strange. Is there a chance that you can change that NFS
mount from async to sync?

On Fri, Sep 24, 2010 at 10:39 PM, Paul ppea...@gmail.com wrote:

Previously, the issue only effected one server (probably the non-
master for the shard, which is why it didn't go to the gateway).

This time around, whatever went bad in the index got persisted to the
gateway, causing all queries against that index to fail.

Thanks,
Paul

On Sep 24, 1:02 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Do you mean it got removed from the gateway?

On Fri, Sep 24, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Hey Shay,
Thanks for all the help on this thread and in IRC. I had cleared the
issue I saw with index21 yesterday, by recovering from the gateway.

However, I wanted to mention that I just moved up to the 0.11 snapshot
and after start up, index21 was hosed on all nodes, as well as, the
gateway. This was the first time I had seen it effect both nodes and
the gateway.

Thanks,
Paul

On Sep 23, 11:50 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Also, wondering out load here, but if you move to master, you might
consider
using the local gateway support (the default now) and not use NFS at
all.

-shay.banon

On Thu, Sep 23, 2010 at 7:49 PM, Shay Banon <
shay.ba...@elasticsearch.com>wrote:

Hey,

A few things:

I suggest you use sync mode and not async with NFS. As writing
is
done
in the background, it does not have any performance implications.
In
any
case I fsync all the files, not sure if it overrides the async mode
of
NFS
or not... .

The java version is pretty old. openjdk lags behind the sun jdk
when
it
comes to new versions (I think in ubuntu its at b18, where a major
memory
leak in LinkedBlockingQueue was fixed in b19).

-shay.banon

On Thu, Sep 23, 2010 at 7:41 PM, Paul ppea...@gmail.com wrote:

Awesome, thanks! I see an update already committed.

Wow, that is weird... Did some googling around and couldn't find
any
details on a bug similar to this.

Probably besides the point, but here are some details on my setup:

Using NFS based gateway, exported such as:
/share/adsearch
dm-adsearchd103(rw,async,no_root_squash)

Using this version of CentOS (not my choice):
Tikanga
CentOS release 5.5 (Final)

Running this version of java:
java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-Bit Server VM (build 1.6.0-b09, mixed mode)

I'm a little suspect of our NFS setup, as it is carved from one of
the
nodes, but this setup is only temporary.

Will plan on moving to master tonight and keep an eye out.

Thanks!
Paul

On Sep 23, 11:11 am, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi Paul,

Yea, that exception helps a lot, though very very very
strange...
.
This
is where its coming from:
    File[] files = path.listFiles();
    if (files == null || files.length == 0) {
        return ImmutableMap.of();
    }
    ImmutableMap.Builder<String, BlobMetaData> builder =
ImmutableMap.builder();
for (File file : files) {
builder.put(file.getName(), new
PlainBlobMetaData(file.getName(), file.length()));
}
return builder.build();
Basically, as you can see, I like the files in a directory, and
then
build
an immutable map from them. The strange thing is that it
complains
that
basically the listFiles returned duplicate File... . I will fix
this,
but
how bizar!.

-shay.banon

On Thu, Sep 23, 2010 at 7:06 PM, Paul ppea...@gmail.com
wrote:

FYI, bumped up gateway logging (required a node restart, which
cleared
the issue), so hopefully will have more data next time around.
Also,
when I shut the node down, I got a stack trace that may be of
more
use.

gist:593991 · GitHub

Thanks,
Paul

On Sep 23, 10:36 am, Paul ppea...@gmail.com wrote:

Hey Shay,
Hitting the snapshot failed exception, at the moment.

I tried increasing the log level, but it doesn't appear the
logging.yml file dynamically updates the log level.

Will probably start restarting nodes and playing around in a
few
minutes. Let me know what I should have in place to get the
necessary
information to track this down next time around. I'm on
0.10.0
and
not
against moving to master, if that would help.

Thanks,
Paul

On Sep 21, 2:33 am, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

The REST interface uses the Java Client to do the
operations,
so I
don't
think its related. I will go over the exceptions and see
that
at
least
they
are properly logged.

On Tue, Sep 21, 2010 at 6:59 AM, Paul ppea...@gmail.com
wrote:

Btw, I was unable to reproduce the search exception via
curl.
Does
the
rest interface have internal retries? I am using the
Java
Node
client. Are there any retries available via that
interface?

Thanks,
Paul

On Sep 20, 6:09 pm, Paul ppea...@gmail.com wrote:

Hey Shay,
Scoured the logs and, unfortunately, that is all I
have.
If
I see
either of these again, will enable more detailed
logging
and
see
what
I capture.

Thanks,
Paul

On Sep 20, 5:48 pm, Shay Banon <
shay.ba...@elasticsearch.com>
wrote:

Hi Paul,

Both are strange. Are there by any chance more
detailed
exceptions in
the
logs?

-shay.banon

On Tue, Sep 21, 2010 at 1:45 AM, Paul <
ppea...@gmail.com>
wrote:

Hi Shay,
Experienced some weird behavior over the weekend
on
0.10
that
I
haven't seen before. Running a 2 node mirrored
cluster.

Searching a certain shard on certain node
fails.
Here
is the
exception I was getting:

RemoteTransportException[[DM-ADSEARCHD102.dev.local][inet[/

10.2.20.160:9301]][search/phase/query/id]];
nested:

QueryPhaseExecutionException[[newsmedia_20100917150044][0]:

query[filtered(+(+feedid:753 +wsodissue:44874)
+__documentdate:[* TO
1285023084000])-

FilterCacheFilterWrapper(QueryWrapperFilter((+indexid:genericnews2 +

(feedid:753 feedid:1236)) (+indexid:newsmedia

+providersubgroup:ap)))],from[0],size[500],sort[<custom:"__documentdate":

org.elasticsearch.index.field.data.FieldData$Type
$4$1@63ab3977>!,<custom:"documentkey":

org.elasticsearch.index.field.data.FieldData$Type$1$1@7e49e6bf

]:

Query Failed [Failed to execute main query]];
nested:

The search is valid and would work every other
time
(as
the
working
time, it went to a good server). To confirm, I
shutdown
the
good node
and it would fail every time. I then brought up
the
good
node,
shutdown the bad one and it would work every time.
After
bringing the
bad node back up, it was still failing the query.
I
was
able to
resolve this by clearing the work directory on the
bad
node.

Snapshot error. I have snapshot interval
disabled
and
am
snapshotting based on content received. I started
receiving
this
exception:
ERROR > Shapshot failed, index:
djnf_20100917150037,
shard: 0,
reason:

BroadcastShardOperationFailedException[[djnf_20100917150037][0]

];

nested:

...

read more »

Topic		Replies	Views
Getting RemoteTransportException/QueryPhaseExecutionException Elasticsearch	9	486	July 6, 2017
Getting RemoteTransportException/QueryPhaseExecutionException Elasticsearch	1	480	July 6, 2017
RemoteTransportException Elasticsearch	4	1636	July 6, 2017
Intermittent shard failures with "has_child" type queries Elasticsearch	15	726	July 6, 2017
Shard error on elasticsearch upgrade Elasticsearch	3	621	July 6, 2017

Two different shard exceptions

Related topics