Hunk vs ElastisSearch

(MK) #1

We are exploring the applicability of using ElasticSearch for a data reconciliation project that we are delivering.
The details are as follows:

  • The project currently uses HUNK
  • The data sources are as follows: Multiple Oracle table data through
    Golden Fate replication, some files SFTPd into the cluster, some data
    coming through JMS queue
  • The data structures are XML, JSON, CSV
  • We need to show data reconciliation reports by reconciling across all
    the data sources, arbitrary correlation across the data sources
  • The cluster already uses MapR Hadoop

I have the following questions regarding ElasticSearch:
-Does ElasticSearch fit the above reconciliation usecase?

  • Should we run ElasticSearch as a indedepndent cluster or use MapR Hadoop as the storage for ES? What are the pros and cons?
  • Can ES ingest data in the XML, JSON, CSV formats? If there is a newer format, how does it work , do we write an adaptor?
  • I see that Hunk shoots MapReduce jobs for every query which increases
    the latency. What does ES do when a query is done, how is it faster than
    Hunk, if it is?
  • Do you have any field reports on Hunk swap out using ES?
  • Does ES perform better with MapR because of the MapRFS random r/w capability? I would assume yes, but want to confirm

Thanks for your support!

(Mark Walkom) #2

Hadoop and ES run in parallel and offer different functionality, so it's not possible to do a direct comparison.

That said, ES (via Logstash) can deal with XML/CSV and RDBMS, and it does JSON natively so it sounds like it would fit the use case.
You cannot store ES on hadoop, it's s standalone product that can be integrated using the es-hadoop connector, but you can't store the ES data on HDFS.

You will probably find that ES is a lot more responsive around querying as it's more realtime. You'd have to test on your systems and data to get an idea though.

(system) #3