Creating a mapping using hadoop configuration


#1

Hi,

I want to put a mapping in an index using elasticsearch-hadoop. Is there a way to create the mapping using the configuration? I've only found information on creating a mapping with the java api by using a node and client.

Thank you


(Costin Leau) #2

The connector can create a mapping based on your schema definition where that is supported (such as Pig, Hive, Spark).
Outside of it, one has to create the mapping externally for several reasons:

  • ease of use; all the corner cases such as merging the mapping, deleting the previous one, etc.. are better handled
  • there are no guarantees in the connector - since it creates multiple tasks, neither Map/Reduce or Spark guarantee what tasks will be executed first and thus the tasks cannot properly coordinate in creating the mapping. One can try and add the mapping, while the other tasks might perceive it as an override.
    Additionally, there's no clear lifecycle between when a task hits the cluster vs config time.
    That is, we don't know for sure across the Map/Reduce, Hive, Pig, Cascading APIs whether the task has been validated and can actually start vs it's just the OutputFormat being instantiated in the chain.

In other words, it's a lot easier and safer to create this externally even more so since it's a one-time thing as oppose to a job which is typically ran several times across different data sets.


#3

Thank you for your help!


(system) #4