There is no message
field. I see, that nginx's module creates pipeline, which actually renames message
to event.original
:
pipeline.yml
- rename:
field: message
target_field: event.original
Generally, what I'm trying to do is to build Web Analytics dashboard, using, non-Elastic application.
I already using Filebeat to send all web access logs to Elastic.
I did basic web analytics dashboard, but ES has big limitations for web analytics:
- I can't get session counts, session length, returning users. etc...
- There is no good set of filter to filter out all bots in the logs. I can do basic one, but it's moving target.
So, I decided, that I'll create daily cron job to pull all web access data from ES and feed it to Matomo. I only pull event.original
field.
It generally works, but the shell scripts I wrote to pull that data sometimes fails, because some of the events doesn't have "event.source". So, I tried to write the query to make sure, that events I pull have event.original
but such query fails.
Here is the daily cron script:
NAME="meshumad.com-$DATE"
curl -XGET "https://***.westus2.azure.elastic-cloud.com:9243/filebeat-*/_search?size=10000" -u elastic:*** -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"gte": "'$DATE'T00:00:00",
"lte": "'$DATE'T23:59:59"
}
}
},
{
"match_phrase": {
"container.name": "meshumad.com"
}
},
{
"match_phrase": {
"http.request.method": "GET"
}
},
{
"exists": {
"field": "source.ip"
}
},
{
"exists": {
"field": "event.original"
}
}
],
"must_not": [
{
"match_phrase": {
"url.original": "/csrftoken"
}
}
]
}
},
"fields": [
"event.original"
],
"sort": [
{
"@timestamp": "asc"
}
],
"_source": false
}' > $NAME.raw
cat $NAME.raw | jq -r '.hits.hits[] | if has("fields") then .fields."event.original"[] else .ignored_field_values."event.original"[] end' > $NAME.log
sudo docker exec matomo-app python3 /var/www/html/misc/log-analytics/import_logs.py --url=http://localhost --login=slavik --password=*** --add-sites-new-hosts --recorders=1 /import/*.log
The error I was getting:
jq: error (at :0): Cannot iterate over null (null)
And when I debugged I found, that jq
fails because some of entries doesn't have event.original
nor ignored_field_values."event.original"
.
For example I found this entry in the the result:
{"_index":"filebeat-7.9.0-2022.01.08-000017","_type":"_doc","_id":"lYzMsn4BnBx-UCJhOtim","_score":null,"sort":[1643677228000]}