Querying to Elasticsearch from Hadoop

(Sanjay Bhosale) #1


I am able to query Elasticsearch DB from main function of Hadoop MapReduce program.
But when i perform same operation in MapReduce part it shows below error to me.

  16/01/07 02:00:35 INFO mapred.JobClient: Task Id : attempt_201601011215_30671_m_000000_2, Status : FAILED
Error: java.lang.ClassNotFoundException: org.elasticsearch.index.query.QueryBuilder
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
        at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

What is the cause of this error. Should i need to copy all required jars to some other location?
Please let me know if have any solution for it.

(Costin Leau) #2

You seem to be using Elasticsearch itself in your map reduce job and not ES-Hadoop connector. In that case, just like with any other library out there, you need to deploy ES alongside your map/reduce jobs. The Hadoop documentation provides more information on what are the way to do that.

(Sanjay Bhosale) #3

Ok Thanks. I am done with it. Can you please suggest me with the method for just checking if part of field value exists or not. I am able to get no of occurrences but i just want to check if it exists since finding number of occurrences is taking much time.

(Costin Leau) #4

I'm not what you are trying to achieve but if you're looking into finding documents that are missing fields or have them declare, take a look at missing and exists query.

(system) #6