Hi,
Is there a way to send a bulk (using the bulk api) to a specific ingest pipeline without having to specify an index name ?
So far, the current situation can lead to funny things, especially when using the date index name processor (or any processor that re-route documents).
My current cluster config is (but doesn't matter) :
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1 23 87 2 0.24 0.23 0.21 i - ingest1
127.0.0.1 39 87 2 0.24 0.23 0.21 m * master1
127.0.0.1 27 87 2 0.24 0.23 0.21 d - warm1
127.0.0.1 26 87 2 0.24 0.23 0.21 d - warm2
127.0.0.1 27 87 2 0.24 0.23 0.21 d - hot1
127.0.0.1 23 87 2 0.24 0.23 0.21 m - master2
127.0.0.1 23 87 2 0.24 0.23 0.21 m - master3
127.0.0.1 28 87 2 0.24 0.23 0.21 d - hot2
127.0.0.1 21 87 2 0.24 0.23 0.21 - - client
The existing indices on this cluster are but again, it doesn't really matter) :
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana Ciaa8qCeSR-hp_BrCSmMNg 1 1 1 0 6.3kb 3.1kb
First, let's define a pipeline to compute the index name using the '@timestamp' field that we expect to find in our documents :
PUT _ingest/pipeline/weeklyindex
{
"description": "weekly date-time index naming",
"processors" : [
{
"date_index_name" : {
"field" : "@timestamp",
"index_name_prefix" : "myindex-",
"date_rounding" : "w"
}
}
]
}
With that done, let's send a bulk to the "weeklyindex" pipeline.
PUT _bulk?pipeline=weeklyindex
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 1}}
{ "text": "Some log message", "@timestamp": "2016-04-25T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 2}}
{ "text": "Some log message", "@timestamp": "2016-04-26T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 3}}
{ "text": "Some log message", "@timestamp": "2016-04-27T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 4}}
{ "text": "Some log message", "@timestamp": "2016-04-28T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 5}}
{ "text": "Some log message", "@timestamp": "2016-04-29T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 6}}
{ "text": "Some log message", "@timestamp": "2016-04-30T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 7}}
{ "text": "Some log message", "@timestamp": "2016-05-01T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 8}}
{ "text": "Some log message", "@timestamp": "2016-05-02T12:02:01.789Z" }
{
"took": 494,
"ingest_took": 1,
"errors": false,
"items": [ ... ]
}
By the looks of it, one might think our documents will be indexed to the "foo" index, but that's not what happens, as the bulk is processed by the ingest pipeline "weeklyindex" which re-route every single document to a indexes that have nothing to do with the "foo" index, as demonstrated by a new call to _cat/indices:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open myindex-2016-04-25 JscqJWrGSySIMzEJDhWFvw 2 1 7 0 15.9kb 7.9kb
green open myindex-2016-05-02 WUNcbUDDQJO3hXHnyyqq8g 2 1 1 0 7.8kb 3.9kb
green open .kibana Ciaa8qCeSR-hp_BrCSmMNg 1 1 1 0 6.3kb 3.1kb
=> is there a cleaner solution / is there a way to send a bulk (using the bulk api) to a specific ingest pipeline without having to specify an index name ?
Best regards,
Charles.w