Hi,
I'm using FSCrawler 2.7 and Elasticsearch and Kibana 7.1.1.
I have a single document where I want to extract and add two separate fields from said document.
I have the following
POST _ingest/pipeline/_simulate { "pipeline": { "description" : "parse multiple patterns", "processors": [ { "grok": { "field": "message", "patterns": ["additionalfield1: (?<additionalfield1>([^,]*))additionalfield2: (?<additionalfield2>([^,]*))"] } } ] }, "docs":[ { "_source": { "message": "This is a document with a lengthy text it contains a number of paragraphs and at the end Ill add some markers that indicate additional information I'd like to pull out and add as additional fields. This is the end of the actual document with additional information being added prior to the closing bracket of the RTF.\nadditionalfield1: this is information associated with additionalfield1\nadditionalfield2: information associated with additionalfield2" } } ] }
That simulation gives the result I'm after
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"additionalfield2" : "information associated with additionalfield2",
"additionalfield1" : "this is information associated with additionalfield1\n",
"message" : """
This is a document with a lengthy text it contains a number of paragraphs and at the end Ill add some markers that indicate additional information I'd like to pull out and add as additional fields. This is the end of the actual document with additional information being added prior to the closing bracket of the RTF.
additionalfield1: this is information associated with additionalfield1
additionalfield2: information associated with additionalfield2
"""
},
"_ingest" : {
"timestamp" : "2019-12-03T03:16:50.505Z"
}
}
}
]
}
if I create the pipeline as
PUT _ingest/pipeline/test_pipeline_id
{
"description" : "parse multiple patterns",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["additionalfield1: (?<additionalfield1>([^,]*))additionalfield2: (?<additionalfield2>([^,]*))"]
}
}
]
}
with the settings file of fscrawler containing the below
pipeline: "test_pipeline_id"
and then run fscrawler as
fscrawler modtest --loop 1 --restart --debug
I end up with the below error
ElasticsearchException[Elasticsearch exception [type=exception, reason=java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [message] not present as part of path [message]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=java.lang.IllegalArgumentException: field [message] not present as part of path [message]]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=field [message] not present as part of path [message]]];
I'm sure I'm doing everything wrong so if a kind soul could please explain what and where that is, and what I need to do, that would be very much appreciated and you'd also make it onto Santa's good list I'm sure!!
Thanks heaps