Start with custom logs

Hi All,

I'm starting with custom logs. Currently I put logs without any additional processing of log lines.
My first problem is that I want search for "ERROR" but my logs look like:

timestamp ERROR example message
timestamp INFO example message finished without error

By default _source.message is text type so when I try search for ERROR I receive both lines in output, as far as I know text type field is case-insensitive.

I was looking for solutions in internet and now I have few concepts but not sure which is the best

  1. Dissect processor. Add processor to Filebeat. Example definition:
- dissect:
    tokenizer: '%{timestamp} %{log_level} %{log_message}'

But I'm not sure if it work with multiline, if it not problem if logs can contain .NET exception, because finally I think its good to have %{log_message} and exception in same field. And I don't know if multiline message has additional field for situation when we need to search for all errors and exceptions.
2. Add_field processor + regex, but it may require as many processors as log level (INFO, WARN, ERROR...)
3. Ingest node. Like in examples above I can try to split log line to different fields. I can copy and edit current module provided with Filebeat ant try to customize it for my case.
4. Change type of field _source.message to keyword type in Filebeat index settings, it should make field case-sensitive for search, but I'm not sure how it affect on Elasticsearch performance.

Do you have idea which option is the best and why? Or maybe do you have better solution for that situation?

I played with _search API in lots of way, but it not help me. Always searches was case-insensitive. I try to use different analyzer for query but it analyze my query only, not affect on result of search. For me it look like data stored in text field are saved as lowercase, but I don't understand why in search results my data looks not changed. Could you explain how it works?

Additionally I want to ask you kindly if is there possibility to send logs from one log file to different index and rest of log files to filebeat index by one running Filebeat agent? And if yes how to do that? I want to play with my data, structure etc. but I don't want to make a mess in filebeat index.
Even if I make a mess in default index I can override all index settings by agent, can't I?

I'm using Elasticsearch and Filebeat in version 7.6

Thanks,
Marcin

Hi @_mslo, I think option 3 is probably the best for your use case. I would suggest using Filebeat primarily to harvest the logs and send them to Elasticsearch. I would not do any processing of the log lines in Filebeat itself. Of course, you might want to setup multiline options in Filebeat, though, since you are expecting exception stack traces sometimes.

Then I would setup an ingest node pipeline for all your processing. There you could probably use the grok processor for the parsing of each line into structured fields.

Finally, back in your Filebeat input configuration, you can use the pipeline option to specify the name of your ingest node pipeline.

Hope that makes sense,

Shaunak

Hi @shaunak,
I followed you suggestion and setup multiline in Flebeat. Ingest node is responsible for rest of processing.
Grok processor definition in my pipeline:

"grok": {
 "field": "message",
 "patterns": [
  "%{CUSTOM_TIMESTAMP:ts} %{LOGLEVEL:log_lvl} %{GREEDYDATA}"
  ],
 "pattern_definitions": {
  "CUSTOM_TIMESTAMP": "%{YEAR}-%{MONTHNUM2}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}"
  }
 }
}

I used default patterns to build custom pattern. Here I found build in pattern definitions. What I need is only change type of ts field because now its keyword and I dont know if I can use keyword type to find for instance older or newer message than date in ts field. But now I have grounds to further play with pipeline.

Why log processing using Filebeat is not recommended? Can significantly increase resource consuming by Filebeat or causes some unexpected errors?

I am still interested in how it works, that when I search for data in the text field, it works as if the text entered into Elasticsearch was lowercase, but as a result we get the text looking original. It would be great to know but generally I have what was the most important for me.

Thank you so much,
Marcin