What is the best approach to do Unit Test against ES Hadoop MR and Storm

(Brad Jungsu Heo) #1

Sorry if this is not the right place to ask about Unit Test.

There are logs in ES and I wrote MR job which reads documents from ES and saves aggregation result back to ES.

I'm wondering How can I do Unit Test for My MR job on my local PC efficiently and simply. I'm reading es-hadoop source code and there seems like EsEmbeddedServer and resources about artists.dat exist. I think you are doing Unit Test using these. But I can't understand how to adapt your approach exactly.

It would be really appreciated it if you could help me.

One more question.

My logs are come from Apache Storm and saved into ES. So, I have to do Unit Test if my Bolt works fine as well. How do you test elasticsearch Storm. Is there something like embedded Storm?

Thanks in advanced.

(Costin Leau) #2

It's best to make your own way of testing mainly because it will serve your needs. The current testing infrastructure is aimed at the ES-Hadoop code base. EsEmbeddedServer is quite a small class and by the way, it is used for integration testing (the fact that JUnit is used doesn't make it a unit test).

Testing Storm can be challenging in terms of integration testing due to its use of multi-threading; Ideally you would test only the bolt internal and not the overall Storm contract.
Either way, I recommend checking out the Storm docs and mailing list for best practices/support classes.
Whatever it is in ES-Hadoop it is really for internal consumption.

(Brad Jungsu Heo) #3


Thank you for your help.

(system) #4