- You can move data from Hadoop to ES using any of the libraries supported; it can Hive or Pig or Map/Reduce or Cascading or Spark or Storm.
While Hive does have an appeal due to its SQL-like capabilities note that all the other libraries support the notion of schema. Further more that is not really needed when only doing ingestion or basic validation.
Also, if you are looking for performance you might want to look around as there might be better candidates for this. In particular Spark and Spark SQL is getting a lot of traction due to its simple deployment, ease of use and speed.
I'm not saying Hive is not a good choice but rather you have plenty of options. And that's a good thing.
- See 1. You can use any of the libraries above and if SQL is your thing, you can also use Spark SQL. Note that if you want to visualize data, you can do so directly in ES through Kibana or other tools; in other words, you are not forced to do the querying from Hadoop; one the data is in ES, any tool/library/client that works with ES can be used.
There aren't any limitations outside hardware; there are too many posts, blog posts, sessions, presentations on the subject that area easily accessible through a simple search, for me to try to cover it all.
The more memory you allocate to ES, the better performance it will have in particular at querying time. 20GB is not a lot of data so you should be able to work fine with the defaults.
P.S. ES doesn't have to load all the data in memory - otherwise it would not scale. So no, it does not require 20GB of memory however if you do have that RAM available for it, it will happily accept it.