I know on the ingest node page it says to run the pipeline like this
PUT my-index/_doc/my-id?pipeline=my_pipeline_id
{
"foo": "bar"
}
but I don't understand the need of "foo":"bar" my pipeline should just extract duration out of that message field and store it in a new field shouldn't it?
It may be helpful to use the Simulate Pipeline API to try out your pipeline to see what the result will be, without actually indexing any documents, like so:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "this is an example pipeline",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["%{NUMBER:duration}"]
}
}
]
},
"docs": [
{
"_source": {
"message": "words::words::words::words (duration=432, words)"
}
}
]
}
We can see that the number is added to the document as duration, because that's the name we used in the grok pattern. However, note that the given grok pattern will just extract the first number in the message field, so if you used a document with "message": "words::words::87::words (duration=432, words)", the extracted duration would be 87.
To make this a little more resilient, you can use regular expressions in your grok pattern, like this: ".*duration=%{NUMBER:duration}.*". That pattern would extract 432 rather than 87 from the second example message.
To use the pipeline, just index documents as normal, but with the pipeline=my_pipeline_id parameter on the request, like in your second post.
The "foo": "bar" is just an example document, it's not required or anything - just replace it with the document you want to run through the pipeline and index.
Nice thanks I was actually trying to make a better expression but I was getting parsing errors, guess I need to brush up on my regex.
As for using the pipeline I believe I'll have to add the pipeline name to fluentd as I don't ingest documents manually here.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.