Bulk API & date index name ingest processor

Charles.w · April 9, 2017, 8:59am

Hi,

Is there a way to send a bulk (using the bulk api) to a specific ingest pipeline without having to specify an index name ?

So far, the current situation can lead to funny things, especially when using the date index name processor (or any processor that re-route documents).

My current cluster config is (but doesn't matter) :

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           23          87   2    0.24    0.23     0.21 i         -      ingest1
127.0.0.1           39          87   2    0.24    0.23     0.21 m         *      master1
127.0.0.1           27          87   2    0.24    0.23     0.21 d         -      warm1
127.0.0.1           26          87   2    0.24    0.23     0.21 d         -      warm2
127.0.0.1           27          87   2    0.24    0.23     0.21 d         -      hot1
127.0.0.1           23          87   2    0.24    0.23     0.21 m         -      master2
127.0.0.1           23          87   2    0.24    0.23     0.21 m         -      master3
127.0.0.1           28          87   2    0.24    0.23     0.21 d         -      hot2
127.0.0.1           21          87   2    0.24    0.23     0.21 -         -      client

The existing indices on this cluster are but again, it doesn't really matter) :

health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana Ciaa8qCeSR-hp_BrCSmMNg   1   1          1            0      6.3kb          3.1kb

First, let's define a pipeline to compute the index name using the '@timestamp' field that we expect to find in our documents :

PUT _ingest/pipeline/weeklyindex
{
  "description": "weekly date-time index naming",
  "processors" : [
    {
      "date_index_name" : {
        "field" : "@timestamp",
        "index_name_prefix" : "myindex-",
        "date_rounding" : "w"
      }
    }
  ]
}

With that done, let's send a bulk to the "weeklyindex" pipeline.

PUT _bulk?pipeline=weeklyindex
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 1}}
{ "text": "Some log message", "@timestamp": "2016-04-25T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 2}}
{ "text": "Some log message", "@timestamp": "2016-04-26T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 3}}
{ "text": "Some log message", "@timestamp": "2016-04-27T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 4}}
{ "text": "Some log message", "@timestamp": "2016-04-28T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 5}}
{ "text": "Some log message", "@timestamp": "2016-04-29T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 6}}
{ "text": "Some log message", "@timestamp": "2016-04-30T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 7}}
{ "text": "Some log message", "@timestamp": "2016-05-01T12:02:01.789Z" }
{ "create": {"_index" : "foo", "_type" : "bar", "_id": 8}}
{ "text": "Some log message", "@timestamp": "2016-05-02T12:02:01.789Z" }

{
  "took": 494,
  "ingest_took": 1,
  "errors": false,
  "items": [ ... ]
}

By the looks of it, one might think our documents will be indexed to the "foo" index, but that's not what happens, as the bulk is processed by the ingest pipeline "weeklyindex" which re-route every single document to a indexes that have nothing to do with the "foo" index, as demonstrated by a new call to _cat/indices:

health status index              uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   myindex-2016-04-25 JscqJWrGSySIMzEJDhWFvw   2   1          7            0     15.9kb          7.9kb
green  open   myindex-2016-05-02 WUNcbUDDQJO3hXHnyyqq8g   2   1          1            0      7.8kb          3.9kb
green  open   .kibana            Ciaa8qCeSR-hp_BrCSmMNg   1   1          1            0      6.3kb          3.1kb

=> is there a cleaner solution / is there a way to send a bulk (using the bulk api) to a specific ingest pipeline without having to specify an index name ?

Best regards,

Charles.w

Charles.w · April 9, 2017, 1:12pm

On a side note, I am fully aware that this processor overrides the _index property of any given document it handles, but I can't help thinking that's not a very clean solution according to the principle of least surprise / astonishment.

spinscale · April 10, 2017, 9:59am

Hey,

currently there is not (except specifying the index in the URL, instead as in each payload, however this does not improve the core problem). You might want to open an Elasticsearch github issue and discuss it there.

--Alex

Charles.w · April 10, 2017, 11:23am

Hi,

You're totally right but I wanted to check if there was an other way and if I was the only one to find it quite annoying. So I'll file a github issue on the subject this evening.

Anyhow, thanks for your answer

Best regards,

Charles.w

system · May 8, 2017, 11:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch date-index-name-processor when set to week, always create w52 Elasticsearch ingest-pipeline	1	468	September 14, 2022
Setting id field in client request and also using date name processor Elasticsearch	1	288	February 20, 2021
Accessing field in date_index_name processor Elasticsearch	1	605	May 30, 2017
Ingest Node Processor Elasticsearch	4	593	December 28, 2017
Split index into subindexes by dates like 'my-index-yyyy.MM.dd' Elasticsearch ingest-pipeline	5	744	August 28, 2023

Bulk API & date index name ingest processor

Related topics