We are exploring the applicability of using ElasticSearch for a data reconciliation project that we are delivering.
The details are as follows:
- The project currently uses HUNK
- The data sources are as follows: Multiple Oracle table data through
Golden Fate replication, some files SFTPd into the cluster, some data
coming through JMS queue
- The data structures are XML, JSON, CSV
- We need to show data reconciliation reports by reconciling across all
the data sources, arbitrary correlation across the data sources
- The cluster already uses MapR Hadoop
I have the following questions regarding ElasticSearch:
-Does ElasticSearch fit the above reconciliation usecase?
- Should we run ElasticSearch as a indedepndent cluster or use MapR Hadoop as the storage for ES? What are the pros and cons?
- Can ES ingest data in the XML, JSON, CSV formats? If there is a newer format, how does it work , do we write an adaptor?
- I see that Hunk shoots MapReduce jobs for every query which increases
the latency. What does ES do when a query is done, how is it faster than
Hunk, if it is?
- Do you have any field reports on Hunk swap out using ES?
- Does ES perform better with MapR because of the MapRFS random r/w capability? I would assume yes, but want to confirm
Thanks for your support!