I followed the Stream any log file guide, and have set up a local agent that listens to my log file. But every time I add a new log (manually to test) the parsing just isn't there when it gets indexed in Kibana. The whole line is just falls under the message field like so:
But when I use the API and make a POST call to the same data stream, it parses correctly:
</>
POST logs-generic-default/_doc
{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}
</>
I already created the project, last_activity and log_time fields.
It is the same for the filebeat (where is parses correctly for each field) but when I try to copy a log line and paste it to my own file that Kibana is also listening to, the entire line just falls under the message field.
Regarding the pipeline, I have created I pipeline (i think), I did it with the Console in the Dev Tools like this:
PUT _ingest/pipeline/logs-generic-default
{
"description": "Extracts the log time, project name and project last activity",
"processors": [
{
"dissect": {
"field": "message",
"pattern": """{"log_time":"%{log_time}","project":"%{project}","last_activity":"%{last_activity}"}"""
}
}
]
}
POST _ingest/pipeline/logs-generic-default/_simulate
{
"docs": [
{
"_source": {
"message": """{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}"""
}
}
]
}
PUT _index_template/logs-generic-default-template
{
"index_patterns": [ "logs-generic-*" ],
"data_stream": { },
"priority": 500,
"template": {
"settings": {
"index.default_pipeline":"logs-generic-default"
}
},
"composed_of": [
"logs-mappings",
"logs-settings",
"logs@custom",
"ecs@dynamic_templates"
],
"ignore_missing_component_templates": ["logs@custom"]
}
The _simulate POST call works as expected,, but my logs don't seem to go through the pipeline before getting indexed, maybe it has something to do with my elastic-agent.yml? which looks like this:
outputs:
default:
type: elasticsearch
hosts: <my-host>
#api_key: 'your-api-key'
username: <my-user>
password: <my-pass>
pipeline: logs-generic-default # is this right to apply the pipeline?
inputs:
- id: logs-generic-default
type: filestream
streams:
- id: logs-generic-default
data_stream.dataset: logs-generic-default
paths:
- C:\Program Files\Elastic\Agent\data\elastic-agent-03ef9d\logs\myapp.log
But I'll try to use the JSON processor, but Im not sure how to apply the ingest pipeline, like where would I add this code (sorry for the image, this is from the JSON processor guide you linked):
Yeah, I got stuck on this step, I get the same results as you with my other pipeline using the disect processor. So back to my initial problem, why is it that that my logs don't get parsed (this is from my myapp.log which I have in my path in the elastic-agent.yml):
There is no project, last_activity and log_time field.
Only:
"message": """{"log_time":"2023-11-28T09:50:33.026Z","project":"Public_Documentation_Mappings","last_activity":"2023-11-25T09:50:33.026Z"}"""
It's like it never goes through the pipeline before getting indexed
I can show you View details from the stream section for each log that it reads from myapp.log
The Elastic data stream naming scheme is made for time series data and consists of splitting datasets into different data streams using the following naming convention.
type: Generic type describing the data
dataset: Describes the data ingested and its structure
namespace: User-configurable arbitrary grouping
These three parts are combined by a “-” and result in data streams like logs-nginx.access-production. In all three parts, the “-” character is not allowed. This means all data streams are named in the following way:
so data_stream.dataset: logs-generic-default
is not allowed... you are making assumptions
My suggestion is to following the instructions exactly get it working and then start changing names etc.. putting those - s in definitely part of the problem
And with that it looks like the first page and the second are NOT aligned UGH!!
So looking at the next page you should set this in your agent
data_stream.dataset: example
index_pattern – Needs to match your log data stream. Naming conventions for data streams are <type>-<dataset>-<namespace> . In this example, your logs data stream is named logs-example-* . Data that matches this pattern will go through your pipeline.
Which then will be aligned with the template and everything on this page
Yeah, Ill change it back to generic, I changed it because I was just experimenting with everything to apply the pipeline... forgot to change it back.
But I can say with confidence that even with the default elastic-agent.yml (following the Stream any log file) it still didn't parse correctly. The only thing I added was the credentials and path.
As of right now I changed my .yml file to look like this:
Regarding this: index_pattern – Needs to match your log data stream. Naming conventions for data streams are <type>-<dataset>-<namespace> . In this example, your logs data stream is named logs-example-* . Data that matches this pattern will go through your pipeline.
I changed data_stream.dataset to example (the listener restarted) and then I added a new log to myapp.log and although it still doesn't parse correctly, it looks like the dataset is still generic?
FYI: even with the logs that were coming in elastic-agent-20231201-3.ndjson that is the default filebeat ("Non-zero metrics in the last 30s" messages) with no custom field names, when I copied those logs into my myapp.log it still didn't parse correctly. That is the the entire log was in the message field
Of course because the ingest pipeline is not getting executed because the template is not getting applied which defines the pipelei etc...etc...etc...etc.. because the dateset is wrong it is all related...
Let me work though this I will get back... it is all close just an issue or 2
Please Verify you are using standalone or are you do you have a Fleet Server?
If so there is easier ways to do this... Standalone is fine I just want to know what you have
Right but you are NOT doing Fleet Managed... (i.e. you did not install a Fleet Server)
Looks like that is correct .. no fleet server .. ok no problem give me 20 mins...
This should not be this hard... sorry... I will be doing on Linux but should translate
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.