Indexing ElasticSearch with Hadoop (LUCENE_36)

Dear all,

I need your help. Here's an high level description of what I'm trying to
do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an
ElasticSearch two nodes cluster.
Some details on the configuration follows:

  • hadoop version, running on Java(TM) SE Runtime Environment (build
    1.7.0_25-b15)
  • elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment
    (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the cluster
as shown here[https://gist.github.com/dpalmisano/6251563], line 120.
Unfortunately, when the reducer tasks try to connect it permanently fails
due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of Lucene
in the classpath. My Hadoop Job dependency tree is totally ok (it uses only
lucene deps coming from elasticsearch 0.90.3), while Hadoop uses and older
version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint
or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The older lucene jar (the one from Hadoop) will be picked up since Hadoop
starts up first and by the time your job gets executed, Lucene is already
loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do
that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?

On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano dpalmisano@gmail.comwrote:

Dear all,

I need your help. Here's an high level description of what I'm trying to
do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an
Elasticsearch two nodes cluster.
Some details on the configuration follows:

  • hadoop version, running on Java(TM) SE Runtime Environment (build
    1.7.0_25-b15)
  • elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment
    (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the
cluster as shown here[https://gist.github.com/dpalmisano/6251563], line
120.
Unfortunately, when the reducer tasks try to connect it permanently fails
due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.(Version.java:42)
at org.elasticsearch.client.transport.TransportClient.(TransportClient.java:165)
at org.elasticsearch.client.transport.TransportClient.(TransportClient.java:121)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.(ElasticSearchBulkOutputFormat.java:67)
at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:583)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of
Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it
uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses
and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any hint
or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Costin,

I really appreciate your prompt response, unfortunately it's not feasibile
to replace all the libs along the whole cluster (which is pretty huge)
and due to the major version changes in Lucene, I doubt it will work out
fine.

I was looking to elasticsearch-hadoop but I have a question:

  1. how does the mapping between what the reducer writes and elasticsearch
    works? Are Reducer keys mapped to ES ids?
    My reducers are writing a key which should be mapped to an ES document id
    and a json document which should be indexed as the proper document into ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best,

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:

The older lucene jar (the one from Hadoop) will be picked up since Hadoop
starts up first and by the time your job gets executed, Lucene is already
loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do
that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?

On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano <dpalm...@gmail.com<javascript:>

wrote:

Dear all,

I need your help. Here's an high level description of what I'm trying to
do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to an
Elasticsearch two nodes cluster.
Some details on the configuration follows:

  • hadoop version, running on Java(TM) SE Runtime Environment (build
    1.7.0_25-b15)
  • elasticsearch version 0.90.3 running on Java(TM) SE Runtime Environment
    (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the
cluster as shown here[https://gist.github.com/dpalmisano/6251563], line
120.
Unfortunately, when the reducer tasks try to connect it permanently fails
due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.(Version.java:42)

    at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:165)
    at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:121)

    at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.start_embedded_client(ElasticSearchBulkOutputFormat.java:151)
    at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat$ElasticSearchBulkRecordWriter.<init>(ElasticSearchBulkOutputFormat.java:67)
    at com.dpalmisano.mapred.es.ElasticSearchBulkOutputFormat.getRecordWriter(ElasticSearchBulkOutputFormat.java:160)
    at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:583)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of
Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it
uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses
and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any
hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The upcoming beta (ETA next week) doesn't have a concept of id - the id is
generated by elasticsearch, however one can use id path to tell ES where to
pick the id.
It looks like you're using plain M/R - in that case, one can pass a
Map that represents the JSON document. that's because in most
cases, the data is read in M/R, Pig, Hive in native types - once the
results are in, we handle the JSON conversion and HTTP communication (so
the users doesn't have to handle it).

However, we do plan to allow json documents to be passed as is without
having to be converted to Writable objects. As an intermediary work around,
you could just load the JSON back as Map (see the WritableUtils
in ES-Hadoop) or, if you're generating JSON from Writable, pass those
directly to es-hadoop.

Hope this helps,

On Fri, Aug 16, 2013 at 8:25 PM, Davide Palmisano dpalmisano@gmail.comwrote:

Thanks Costin,

I really appreciate your prompt response, unfortunately it's not feasibile
to replace all the libs along the whole cluster (which is pretty huge)
and due to the major version changes in Lucene, I doubt it will work out
fine.

I was looking to elasticsearch-hadoop but I have a question:

  1. how does the mapping between what the reducer writes and elasticsearch
    works? Are Reducer keys mapped to ES ids?
    My reducers are writing a key which should be mapped to an ES document id
    and a json document which should be indexed as the proper document into ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best,

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:

The older lucene jar (the one from Hadoop) will be picked up since Hadoop
starts up first and by the time your job gets executed, Lucene is already
loaded.
You could try replacing the lucene jar in Hadoop (remember you have to do
that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?

On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano dpalm...@gmail.comwrote:

Dear all,

I need your help. Here's an high level description of what I'm trying to
do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to
an Elasticsearch two nodes cluster.
Some details on the configuration follows:

  • hadoop version, running on Java(TM) SE Runtime Environment (build
    1.7.0_25-b15)
  • elasticsearch version 0.90.3 running on Java(TM) SE Runtime
    Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the
cluster as shown here[https://gist.github.com/**dpalmisano/6251563https://gist.github.com/dpalmisano/6251563],
line 120.
Unfortunately, when the reducer tasks try to connect it permanently
fails due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<**clinit>(Version.java:42)

    at org.elasticsearch.client.**transport.TransportClient.<**init>(TransportClient.java:**165)
    at org.elasticsearch.client.**transport.TransportClient.<**init>(TransportClient.java:**121)

    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat$**ElasticSearchBulkRecordWriter.**start_embedded_client(**ElasticSearchBulkOutputFormat.**java:151)
    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat$**ElasticSearchBulkRecordWriter.**<init>(**ElasticSearchBulkOutputFormat.**java:67)
    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat.**getRecordWriter(**ElasticSearchBulkOutputFormat.**java:160)
    at org.apache.hadoop.mapred.**ReduceTask$**NewTrackingRecordWriter.<init>**(ReduceTask.java:583)
    at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**ReduceTask.java:652)
    at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:426)
    at org.apache.hadoop.mapred.**Child$4.run(Child.java:255)
    at java.security.**AccessController.doPrivileged(**Native Method)
    at javax.security.auth.Subject.**doAs(Subject.java:415)
    at org.apache.hadoop.security.**UserGroupInformation.doAs(**UserGroupInformation.java:**1132)
    at org.apache.hadoop.mapred.**Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of
Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it
uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses
and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any
hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For a quick-dirty, I've had M-R jobs create bulk index statement and just
used Curl or the Jersey REST client to post them.

Michael Sick | Big Data Architect
Serene Software Inc.
919-523-4447 (cell) | michael.sick@serenesoftware.com |
www.serenesoftware.com
Core: Elasticsearch | HBase | Hadoop | RedShift/ParAccel | Hive

On Fri, Aug 16, 2013 at 2:33 PM, Costin Leau costin.leau@gmail.com wrote:

The upcoming beta (ETA next week) doesn't have a concept of id - the id is
generated by elasticsearch, however one can use id path to tell ES where to
pick the id.
It looks like you're using plain M/R - in that case, one can pass a
Map that represents the JSON document. that's because in most
cases, the data is read in M/R, Pig, Hive in native types - once the
results are in, we handle the JSON conversion and HTTP communication (so
the users doesn't have to handle it).

However, we do plan to allow json documents to be passed as is without
having to be converted to Writable objects. As an intermediary work around,
you could just load the JSON back as Map (see the WritableUtils
in ES-Hadoop) or, if you're generating JSON from Writable, pass those
directly to es-hadoop.

Hope this helps,

On Fri, Aug 16, 2013 at 8:25 PM, Davide Palmisano dpalmisano@gmail.comwrote:

Thanks Costin,

I really appreciate your prompt response, unfortunately it's not
feasibile to replace all the libs along the whole cluster (which is pretty
huge)
and due to the major version changes in Lucene, I doubt it will work out
fine.

I was looking to elasticsearch-hadoop but I have a question:

  1. how does the mapping between what the reducer writes and elasticsearch
    works? Are Reducer keys mapped to ES ids?
    My reducers are writing a key which should be mapped to an ES document id
    and a json document which should be indexed as the proper document into ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best,

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:

The older lucene jar (the one from Hadoop) will be picked up since
Hadoop starts up first and by the time your job gets executed, Lucene is
already loaded.
You could try replacing the lucene jar in Hadoop (remember you have to
do that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?

On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano dpalm...@gmail.comwrote:

Dear all,

I need your help. Here's an high level description of what I'm trying
to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to
an Elasticsearch two nodes cluster.
Some details on the configuration follows:

  • hadoop version, running on Java(TM) SE Runtime Environment (build
    1.7.0_25-b15)
  • elasticsearch version 0.90.3 running on Java(TM) SE Runtime
    Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the
cluster as shown here[https://gist.github.com/**dpalmisano/6251563https://gist.github.com/dpalmisano/6251563],
line 120.
Unfortunately, when the reducer tasks try to connect it permanently
fails due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<**clinit>(Version.java:42)

    at org.elasticsearch.client.**transport.TransportClient.<**init>(TransportClient.java:**165)
    at org.elasticsearch.client.**transport.TransportClient.<**init>(TransportClient.java:**121)


    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat$**ElasticSearchBulkRecordWriter.**start_embedded_client(**ElasticSearchBulkOutputFormat.**java:151)

    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat$**ElasticSearchBulkRecordWriter.**<init>(**ElasticSearchBulkOutputFormat.**java:67)


    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat.**getRecordWriter(**ElasticSearchBulkOutputFormat.**java:160)

    at org.apache.hadoop.mapred.**ReduceTask$**NewTrackingRecordWriter.<init>**(ReduceTask.java:583)
    at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**ReduceTask.java:652)
    at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:426)
    at org.apache.hadoop.mapred.**Child$4.run(Child.java:255)


    at java.security.**AccessController.doPrivileged(**Native Method)
    at javax.security.auth.Subject.**doAs(Subject.java:415)
    at org.apache.hadoop.security.**UserGroupInformation.doAs(**UserGroupInformation.java:**1132)


    at org.apache.hadoop.mapred.**Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of
Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it
uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses
and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any
hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Michael,

but the way suggested by Costin worked like a charm!

Thank you very much for your help,

Davide

On Sun, Aug 18, 2013 at 3:09 AM, Michael Sick <
michael.sick@serenesoftware.com> wrote:

For a quick-dirty, I've had M-R jobs create bulk index statement and just
used Curl or the Jersey REST client to post them.

Michael Sick | Big Data Architect
Serene Software Inc.
919-523-4447 (cell) | michael.sick@serenesoftware.com |
www.serenesoftware.com
Core: Elasticsearch | HBase | Hadoop | RedShift/ParAccel | Hive

On Fri, Aug 16, 2013 at 2:33 PM, Costin Leau costin.leau@gmail.comwrote:

The upcoming beta (ETA next week) doesn't have a concept of id - the id
is generated by elasticsearch, however one can use id path to tell ES where
to pick the id.
It looks like you're using plain M/R - in that case, one can pass a
Map that represents the JSON document. that's because in most
cases, the data is read in M/R, Pig, Hive in native types - once the
results are in, we handle the JSON conversion and HTTP communication (so
the users doesn't have to handle it).

However, we do plan to allow json documents to be passed as is without
having to be converted to Writable objects. As an intermediary work around,
you could just load the JSON back as Map (see the WritableUtils
in ES-Hadoop) or, if you're generating JSON from Writable, pass those
directly to es-hadoop.

Hope this helps,

On Fri, Aug 16, 2013 at 8:25 PM, Davide Palmisano dpalmisano@gmail.comwrote:

Thanks Costin,

I really appreciate your prompt response, unfortunately it's not
feasibile to replace all the libs along the whole cluster (which is pretty
huge)
and due to the major version changes in Lucene, I doubt it will work
out fine.

I was looking to elasticsearch-hadoop but I have a question:

  1. how does the mapping between what the reducer writes and
    elasticsearch works? Are Reducer keys mapped to ES ids?
    My reducers are writing a key which should be mapped to an ES document
    id and a json document which should be indexed as the proper document into
    ES.

do you think that elasticsearch-hadoop could help in this scenario?

all the best,

On Friday, August 16, 2013 6:09:12 PM UTC+1, Costin Leau wrote:

The older lucene jar (the one from Hadoop) will be picked up since
Hadoop starts up first and by the time your job gets executed, Lucene is
already loaded.
You could try replacing the lucene jar in Hadoop (remember you have to
do that across your entire cluster).

Speaking of which, have you looked at elasticsearch-hadoop?

On Fri, Aug 16, 2013 at 7:59 PM, Davide Palmisano dpalm...@gmail.comwrote:

Dear all,

I need your help. Here's an high level description of what I'm trying
to do, followed by some dives into details.

I'm running an Hadoop Job with basically indexes bulks of documents to
an Elasticsearch two nodes cluster.
Some details on the configuration follows:

  • hadoop version, running on Java(TM) SE Runtime Environment (build
    1.7.0_25-b15)
  • elasticsearch version 0.90.3 running on Java(TM) SE Runtime
    Environment (build 1.6.0_45-b06)

Using, a TransportClient, each Reducer task tries to connect to the
cluster as shown here[https://gist.github.com/**dpalmisano/6251563https://gist.github.com/dpalmisano/6251563],
line 120.
Unfortunately, when the reducer tasks try to connect it permanently
fails due this exception:

2013/08/16 16:48:52,604 [main] FATAL org.apache.hadoop.mapred.Child - Error running child : java.lang.NoSuchFieldError: LUCENE_36
at org.elasticsearch.Version.<**clinit>(Version.java:42)

    at org.elasticsearch.client.**transport.TransportClient.<**init>(TransportClient.java:**165)
    at org.elasticsearch.client.**transport.TransportClient.<**init>(TransportClient.java:**121)




    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat$**ElasticSearchBulkRecordWriter.**start_embedded_client(**ElasticSearchBulkOutputFormat.**java:151)



    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat$**ElasticSearchBulkRecordWriter.**<init>(**ElasticSearchBulkOutputFormat.**java:67)




    at com.dpalmisano.mapred.es.**ElasticSearchBulkOutputFormat.**getRecordWriter(**ElasticSearchBulkOutputFormat.**java:160)



    at org.apache.hadoop.mapred.**ReduceTask$**NewTrackingRecordWriter.<init>**(ReduceTask.java:583)
    at org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**ReduceTask.java:652)
    at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:426)
    at org.apache.hadoop.mapred.**Child$4.run(Child.java:255)




    at java.security.**AccessController.doPrivileged(**Native Method)
    at javax.security.auth.Subject.**doAs(Subject.java:415)
    at org.apache.hadoop.security.**UserGroupInformation.doAs(**UserGroupInformation.java:**1132)




    at org.apache.hadoop.mapred.**Child.main(Child.java:249)

Digging it around, it seems it's mainly related to an old version of
Lucene in the classpath. My Hadoop Job dependency tree is totally ok (it
uses only lucene deps coming from elasticsearch 0.90.3), while Hadoop uses
and older version (2.9.4).

Does anybody ever met this problem before? I'll really appreciate any
hint or help from you guys.

thanks in advance

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/aLR2na9ZuIc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Davide Palmisano

http://twitter.com/dpalmisano

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.