As an update,
This is now supported in master (the upcoming elasticsearch hadoop 1.3.0.M3). From the console:
01:13:50,630 INFO main mapred.JobClient - Elasticsearch Hadoop Counters
01:13:50,630 INFO main mapred.JobClient - Bytes Written=173923
01:13:50,630 INFO main mapred.JobClient - Bytes Read=0
01:13:50,631 INFO main mapred.JobClient - Bulk Retries=0
01:13:50,631 INFO main mapred.JobClient - Network Retries=0
01:13:50,631 INFO main mapred.JobClient - Bulk Writes=22
01:13:50,631 INFO main mapred.JobClient - Documents Read=0
01:13:50,631 INFO main mapred.JobClient - Documents Written=993
01:13:50,631 INFO main mapred.JobClient - Node Retries=0
01:13:50,631 INFO main mapred.JobClient - Documents Retried=0
Cheers,
On 12/02/2014 7:59 PM, A Bose wrote:
This is to capture the time taken by ES to process the items in that batch of records. Yes the total size written in
bytes will already be in a MR counter.
On Feb 12, 2014 8:30 AM, "Costin Leau" <costin.leau@gmail.com mailto:costin.leau@gmail.com> wrote:
We can introduce such counters. What exactly are you interested in?
The default counters in Hadoop provide information on the amount of data read/written.
Do you want to extract the information directly in Hadoop as oppose to ES proper?
On 12/02/2014 5:13 PM, Abhijit Bose wrote:
Hello,
I would like to collect some stats on the entries being written when running a MapReduce job using the
elasticsearch-hadoop library. I am using the default Mapper.class with a batch of entries in JSON files as
input to MR,
e.g.
job.setInputFormatClass(__TextInputFormat.class);
job.setOutputFormatClass(__EsOutputFormat.class);
job.setMapOutputValueClass(__Text.class);
job.setMapperClass(Mapper.__class);
job.setNumReduceTasks(0);
What I would like to do is collect stats on the entries that were passed on to ESOutPutFormat that got written
to the ES
cluster, similar to how one would collect the stats using BulkResponse, mostly around how many millis it took
for the
batch operation.
For a new index to be populated with a bunch of docs via MR (e.g. daily logs), there is an easier way to do
this. In the
main MR thread, once the map tasks finish, I can call the Stats API on the index to get the stats up to the time
of the
API call. However, when I am writing in batch from a Mapper-->ESOutPutFormat-->__RestRepository, I would like
to collect
the stats at this time. Is that possible without extending the current library (e.g. by introducing a set of MR
Counters in ESOutPutFormat.java) ?
Thanks!
Abhijit
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@__googlegroups.com <mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/__msgid/elasticsearch/7090aaee-__8436-4bfa-b78a-ebeb7cd4aff8%__40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/7090aaee-8436-4bfa-b78a-ebeb7cd4aff8%40googlegroups.com>.
For more options, visit https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>.
--
Costin
--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/__topic/elasticsearch/__NF1sSaHzQU0/unsubscribe
<https://groups.google.com/d/topic/elasticsearch/NF1sSaHzQU0/unsubscribe>.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@__googlegroups.com
<mailto:elasticsearch%2Bunsubscribe@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/__msgid/elasticsearch/52FB9379.__7070002%40gmail.com
<https://groups.google.com/d/msgid/elasticsearch/52FB9379.7070002%40gmail.com>.
For more options, visit https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPyBHrsGn_vb4V7m2Cg-gG8TwMva_2y1pDYBfBHKHsMSF-U8Lg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Costin
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53069C28.3040606%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.