Ingesting data from Spark to Elasticsearch with index template

Akshay_Mhetre · August 30, 2016, 3:30pm

In our existing design we are using logstash to fetch data from Kafka (JSON) and put it in ElasticSearch.

We are also using index template mapping while inserting data from logstash to ES and this could be done by setting 'template' property of ES output plugin of logstash, e.g.,

output {
elasticsearch {
template => "elasticsearch-template.json", //template file path
hosts => "localhost:9200"
template_overwrite => true
manage_template => true
codec=>plain
}
}
elasticsearch-template.json looks like below,

{
"template" : "logstash-",
"settings" : {
"index.refresh_interval" : "3s"
},
"mappings" : {
"default" : {
"_all" : {"enabled" : true},
"dynamic_templates" : [ {
"string_fields" : {
"match" : "",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string", "index" : "analyzed", "omit_norms" : true,
"fields" : {
"raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256, "doc_values":true}
}
}
}
} ],
"properties" : {
"@version": { "type": "string", "index": "not_analyzed" },
"geoip" : {
"type" : "object",
"dynamic": true,
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
}
}
Now we are going to replace logstash with Apache Spark and I want to use similar kind of usage of index template in Spark while inserting data to ES.

What is the way to achieve that ?

Thanks.

james.baiera · August 31, 2016, 11:05pm

You could always use an indexing template on Elasticsearch. Just make sure that your spark job is configured to create the index if it doesn't exist (es.index.auto.create). As long as your index name matches the template pattern in Elasticsearch, it'll be applied at creation time.