i'm trying to work out how to add information about syslog priority to my logdata in elasticsearch.
I'm testing out a self managed ELK stack, for collecting syslog data from linux(ubuntu) servers and workstations.
I'm starting from scratch. I want to keep it simple where i can, and ingest directly to elasticsearch through fleet-managed elastic agents,without logstash, unless there's a good reason.
Information about syslog priority is important for us, especially severity. But i'm struggling to find out how i can make this data available to us. I would presume this loglevel info is important for many users, and thus be handled by the standard system integration. However i can't find any info
about syslog priority/loglevel in the resulting logdata in elasticsearch, or on the info page for the System integration that handles syslog.
is this information just not available through the standard integrations? If not, what would be the best way to go, if i need to set this up?
I see that logstash has a syslog_pri filter plugin for this, but is there a way that does not include the extra complexity of adding logstash to the mix?
You mean, you need the <PRI> number and extract the facility and severity numeric values from it?
If I'm not wrong this is present when you send the logs using rsyslog to a remote destination and the elastic agent system integration only reads log files locally, where this information is not present.
So to do this with Elastic Agent you would need to use a Custom TCP or UDP input, parse the message using some processor like grok or dissect and with the priority number you could decode it to get the facility and severity, then with the severity and facility numbers you would be able to have this information.
The following python code is an example on how to decode the PRI number into severity and facility, you would need a similar code on an script processor in an ingest pipeline.
With those values you would be able to add the information in your event using a set processor, an enrich processor or even some logic in the same script processor that would decode the PRI.
If you check the code used by the syslog_pri filter, it basically have one array with the facility values and another one with the severity labels, and it populates the field using the numeric values of severity and facility as the indices on the arrays.
If I can give a suggestion here, plan what you need to do with your data to see if it would not be better to have Logstash instead of sending it directly to Elasticsearch.
Elastic Agent integrations and Ingest Pipelines really help to transform your data with only Elasticsearch, but they do not compare with Logstash and have a lot of limitations.
Things that are pretty simple in Logstash will need a lot of work without it and if you need to enrich your data with information from external sources, you basically will need Logstash depending on the volume of this data.
yes, I was talking about the field. Thanks for the suggestion. It seems this might be more straightforward using logstash.
After reading up more on logstash input configuration i tried forwarding from a rsyslogd server that aggregates syslog data from many servers, into a logstash syslog input plugin that provides PRI parsing out of the box. It seems to be working nicely.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.