Struggling with use of elasticsearch ingest pipeline for custom parsing of a unique flatfile log format. Running into problems. Not sure if there's mix of issues here.
Here's the scenerio...
Format uses multi-line for messages. Date/time format is, problematic.... YYYY.MM.dd HH:mm:ss.SSS
Am trying to use the log date/time as the timestamp of the document in the index.
Symptoms of problem:
partial ingest of file (seems to stop after empty lines) - unclear if bug with filebeat tailing (on windows server or issue with setup of pipeline and/or filebeat config) Example, log file only has about 40+ records ingested to elasticsearch but there's over 10,000 lines in the actual file on filesystem,
errors in filebeat log ...
{"type":"illegal_argument_exception","reason":"Provided Grok expressions do not match field value: [2020.08.04 21:56:09.478 (cc-123): REDACTED\n2020.08.04 21:56:09.478 (cc-123): REDACTED"}
Any insights from experienced elasticsearch pipeline admin much appreciated! Could also use pointers on how to do basic debugging... how can I get information about the pipeline process itself - other than looking at the filebeat log (on one end) and the output (partial as it is) within elasticsearch?
Am still having situation with only partial file ingest.
There are many of these sort of errors in the filebeat log (this is actual from filebeat log as example):
{"type":"illegal_argument_exception","reason":"Provided Grok expressions do not match field value: [2020.08.19 11:04:21.032 (cc-11): [bcac3929-a3a9-4916-80bb-5196b83d1952] WopiController.RunWopiRequestInClickTx: Handling GetDocumentInfoTx for document \"D7AB650CDB3C11EA6484523796565000\"]"}
The related line from file being ingested... 2020.08.19 11:04:21.032 (cc-11): [bcac3929-a3a9-4916-80bb-5196b83d1952] WopiController.RunWopiRequestInClickTx: Handling GetDocumentInfoTx for document "D7AB650CDB3C11EA6484523796565000"
I'm new to the platform, so forgive me for perhaps commenting on an irrelevant post
I'm trying to build a custom Filebeat module but to me the Doc is a little unspecific:
https://www.elastic.co/guide/en/beats/devguide/current/filebeat-modules-devguide.html
I've come as far as creating the module, the fileset and writing out a log/ingest/pipeline.json
However when I try to run make create-fields I keep hitting stumbling blocks:
i.e
mage generate:fields
Error: cannot read pipeline: invalid character ']' looking for beginning of value
make: *** [create-fields] Error 1
Any further guidance would be greatly appreciated.
@warkolm I used the handy built-in console in Dev Tools to run some simulate ingest data into the clicklog_redate pipeline which has been failing with "illegal_argument_exception" with Grok expression not matching field value.
I found that the "cc-11" values which I'm using as the "process" field and trying to grok using \(%{WORD:process}\)\: is apparently thrown off by the - character. Having a - in "WORD" value must terminate that particular mask (or whatever).
So, I'm not very grok saavy, but what would be the solution here? Sometimes there's a - in the process name, sometimes not.
Found that can use the USERNAME grok pattern instead of WORD and it handles the dash character boundary as if part of the same thing (which it is in this case), so ...
SOLUTION changed the redate pipeline pattern from this:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.