How to setup Ingest Pipeline

This is not a question but rather a solution.
last week I was trying to find some simple to do list of how to setup ingest pipleline and make that for particular index. and was hard to find this info. All elastic document talk about is how to create ingest pipeline and error handling etc..

here is step by step process in simple language.

Create sample index, you might have index already exist.

PUT sachin_index

Post a document inside that index, you might have bunch of record already in index. there is a step on how to fix them on bottom of this thread.

POST sachin_index/_doc/
{
  "@timestamp": "2022-07-17T10:12:00",
  "script": "#!/bin/bash\nunset SLURM_JWT\nexport SLURM_SCHEDULER=T\nexport jobnumber=800056364\nexport objectid=409069672\nexport SINGULARITY_CFG=\"-ev CONTAINER_LAUNCHER singularity\"\nexport jobname=tst0708_alp\nexport MYID=800056364\nexport project=a::xyz:d9999\nexport threading=TRUE\ncontainer=\"\"\njoblocation=/data3/d9999/90_test/01/05_surface/03d_test_ap/01_p1\n",
  "user": {
    "id": "sachin"
  }
}

Create a ingest pipeline
And then define this as it’s default ingest_pipeline for index "sachin_index". From that onward any document that goes to that index will pass through this ingest pipeline

That paterns is something like this for this example

%{GREEDYDATA:rm1}jobnumber=%{NUMBER:job:int}(?m)%{GREEDYDATA:rm1}project=a::xyz:%{WORD:project}(?m)%{GREEDYDATA:rm1}joblocation=%{PATH:jobpath}

Following command will tell elastic that any docment comes for this index will go through this ingest pipleline.

PUT sachin_index/_settings
{
  "index.default_pipeline": "slurm_jobcomp-pipeline"
}

This means all new document will get transform but old won’t. in order to execute all all document and transform them do this.
This will remove script from old documents and transform them. If you have too many document in this index then you might want to run this through command line because kibana will time out.

POST sachin_index/_update_by_query?
pipeline=slurm_jobcomp-pipeline
{
  "query":{ "match_all":{} } }  
}

Command line

curl -u user:password -XPOST "http://hostname:9200/sachin_index/_update_by_query?pipeline=slurm_jobcomp-pipeline" -H 'Content-Type: application/json' -d'
 {    "query": {  "match_all":{}    }
  }'

this will create three more field ( job, project, jobpath )
and remove script field and also will remove rm1 which I don't need.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.