How to setup Ingest Pipeline

elasticforme · July 19, 2022, 6:26pm

This is not a question but rather a solution.
last week I was trying to find some simple to do list of how to setup ingest pipleline and make that for particular index. and was hard to find this info. All elastic document talk about is how to create ingest pipeline and error handling etc..

here is step by step process in simple language.

Create sample index, you might have index already exist.

PUT sachin_index

Post a document inside that index, you might have bunch of record already in index. there is a step on how to fix them on bottom of this thread.

POST sachin_index/_doc/
{
  "@timestamp": "2022-07-17T10:12:00",
  "script": "#!/bin/bash\nunset SLURM_JWT\nexport SLURM_SCHEDULER=T\nexport jobnumber=800056364\nexport objectid=409069672\nexport SINGULARITY_CFG=\"-ev CONTAINER_LAUNCHER singularity\"\nexport jobname=tst0708_alp\nexport MYID=800056364\nexport project=a::xyz:d9999\nexport threading=TRUE\ncontainer=\"\"\njoblocation=/data3/d9999/90_test/01/05_surface/03d_test_ap/01_p1\n",
  "user": {
    "id": "sachin"
  }
}

Create a ingest pipeline
And then define this as it’s default ingest_pipeline for index "sachin_index". From that onward any document that goes to that index will pass through this ingest pipeline

That paterns is something like this for this example

%{GREEDYDATA:rm1}jobnumber=%{NUMBER:job:int}(?m)%{GREEDYDATA:rm1}project=a::xyz:%{WORD:project}(?m)%{GREEDYDATA:rm1}joblocation=%{PATH:jobpath}

Following command will tell elastic that any docment comes for this index will go through this ingest pipleline.

PUT sachin_index/_settings
{
  "index.default_pipeline": "slurm_jobcomp-pipeline"
}

This means all new document will get transform but old won’t. in order to execute all all document and transform them do this.
This will remove script from old documents and transform them. If you have too many document in this index then you might want to run this through command line because kibana will time out.

POST sachin_index/_update_by_query?
pipeline=slurm_jobcomp-pipeline
{
  "query":{ "match_all":{} } }  
}

Command line

curl -u user:password -XPOST "http://hostname:9200/sachin_index/_update_by_query?pipeline=slurm_jobcomp-pipeline" -H 'Content-Type: application/json' -d'
 {    "query": {  "match_all":{}    }
  }'

this will create three more field ( job, project, jobpath )
and remove script field and also will remove rm1 which I don't need.

system · August 16, 2022, 6:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Creating an Ingest Pipeline Elasticsearch	5	653	August 30, 2018
Observability_Logging Elastic Training	2	507	August 17, 2020
How does we can implement ingest pipeline in runtime Elasticsearch ingest-pipeline	3	434	February 7, 2022
Add new fields using ingest pipeline script processor Elasticsearch	5	3433	July 8, 2019
Ingest Pipeline for Transformations Elasticsearch	7	442	August 10, 2020

How to setup Ingest Pipeline

Related topics