Incorrectly collected data in Logstash to Elasticsearch

eduhernandezm · May 4, 2022, 7:49am

Hello,

I have set up two processes to collect data and present it to ELK.
On the one hand I have a process that collects log data from an iis and on the other I have another process that what it does is import the data from a daily cvs to a new index in Elasticsearch.

The files that I load in logstash are the following:

input {
  beats {
    port => 5044
  }
}

filter {

        # ignore log comments
        if [message] =~ "^#" {
                drop {}
        }

        # check that fields match your IIS log settings
        grok {

                match => { "message" => "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:s-sitename} %{NOTSPACE:s-computername} %{IPORHOST:s-ip} %{WORD:cs-method} %{URIPATH:cs-uri-stem} (?:-|\"%{URIPATH:cs-uri-query}\") %{NUMBER:s-port} %{NOTSPACE:cs-username} %{IPORHOST:c-ip} %{NOTSPACE:cs-version} %{NOTSPACE:cs-useragent} %{NOTSPACE:cs-cookie} %{NOTSPACE:cs-referer} %{NOTSPACE:cs-host} %{NUMBER:sc-status} %{NUMBER:sc-substatus} %{NUMBER:sc-win32-status} %{NUMBER:sc-bytes} %{NUMBER:cs-bytes} %{NUMBER:time-taken}" }
        }

        # set the event timestamp from the log
        # https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
        date {
                match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
                timezone => "Etc/UCT"
        }

        # matches the big, long nasty useragent string to the actual browser name, version, etc
        # https://www.elastic.co/guide/en/logstash/current/plugins-filters-useragent.html
        useragent {
                source=> "useragent"
                prefix=> "browser_"
        }

        mutate {
                remove_field => [ "log_timestamp"]
        }
}

output {
   elasticsearch {
      hosts => ["elasticsearch_server:9200"]
      index => "iis-log-%{+YYYY.MM.dd}"
   }
}

input {
    file {
        path => "/etc/logstash/conf.d/nextcloud-report-usage/*.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}

filter {
    csv {
        separator => ","
        columns => [
            "user-id",
            "date",
            "last_login_date",
            "assigned_quota",
            "used_quota",
            "number_files",
            "number_shares",
            "number_uploads",
            "number_downloads"
        ]
        separator => ","
        remove_field => ["message"]
    }
    date {
        match => [ "date", "yyyy-MM-dd HH:mm:ss"]
        target => ["@timestamp"]
    }
    date {
        match => [ "last_login_date", "yyyy-MM-dd HH:mm:ss"]
    }
    mutate {convert => ["assigned_quota", "integer"]}
    mutate {convert => ["used_quota", "integer"]}
    mutate {convert => ["number_files", "integer"]}
    mutate {convert => ["number_shares", "integer"]}
    mutate {convert => ["number_uploads", "integer"]}
    mutate {convert => ["number_downloads", "integer"]}
}

output {
   elasticsearch {
        hosts => ["elasticsearch_server:9200"]
        index => "nextcloud-metrics-%{+YYYY.MM.dd}"
   }
}

How can I do to have these two processes started simultaneously? That is, the iis logs and the execution of a daily load of data from a csv file.

On the other hand, I have launched logstash to test the data load of the cvs file and see that everything is correct and that it generates the corresponding index correctly, and when I see the log I see the following error:

0390e14f3b27abd3423a1] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"nextcloud-metrics-2022.05.04", :routing=>nil}, {"tags"=>["_dateparsefailure"], "last_login_date"=>"2022-05-02T13:01:07+00:00", "number_uploads"=>0, "used_quota"=>22869957, "log"=>{"file"=>{"path"=>"/etc/logstash/conf.d/nextcloud-report-usage/usage_report_04052022_2.csv"}}, "date"=>"2022-05-02T13:17:40+00:00", "number_files"=>36, "assigned_quota"=>26843545600, "@version"=>"1", "number_downloads"=>6, "event"=>{"original"=>"apiuser,2022-05-02T13:17:40+00:00,2022-05-02T13:01:07+00:00,26843545600,22869957,36,0,0,6"}, "@timestamp"=>2022-05-04T07:35:18.689267Z, "user-id"=>"apiuser", "number_shares"=>0, "host"=>{"name"=>"tfvs0756.medusa.gobiernodecanarias.net"}}], :response=>{"index"=>{"_index"=>"nextcloud-metrics-2022.05.04", "_id"=>"VQP-jYABi-PU5vurrNZ5", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper [last_login_date] cannot be changed from type [text] to [date]"}}}}

And in this process, I also see that it does not load all the data from the csv file. I would have to load 149 records, which are the ones in the file and it only loads 143, not seeing where the problem could be.

I hope you can help me as I am starting to use this tool.

Best regards.

warkolm · May 5, 2022, 2:17am

The important thing to realise here is that if you have these as two different config files in Logstash, when you start Logstash it'll actually merge them both into one. That means that every event that comes in via either of your inputs will be run through each filter and output section.

Take a look at Multiple Pipelines | Logstash Reference [8.2] | Elastic for keeping things separated.

eduhernandezm · May 5, 2022, 10:38am

Thank you very much for the info.

I have configured the "pipeline.yml" file with the following configuration:

- pipeline.id: iis-log
  path.config: "/etc/logstash/conf.d/iis-log.conf"
- pipeline.id: nextcloud-usage
  path.config: "/etc/logstash/conf.d/nextcloud-usage.conf"

I have restarted logstash, and in the logstash log I have not seen any errors generated, so the process has been executed correctly. Waiting a few minutes, I have verified the creation of the corresponding index, and indeed I can see that the index corresponding to the nextcloud-usage pipeline has been generated, which takes a CVS file as input.

After a while in the logstash log it shows me some Warnings like the following:

[2022-05-05T09:09:39,531][WARN ][logstash.outputs.elasticsearch][nextcloud-usage][beaaf97a5d411a0b3614355a7af34bf31a781e101910c6a563ac776d1ba07303] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"nextcloud-metrics-2022.05.05", :routing=>nil}, {"number_downloads"=> nil, "number_uploads"=>nil, "@version"=>"1", "@timestamp"=>2022-05-05T09:09:38.677781Z, "number_shares"=>nil, "used_quota"=>nil, "host"=>{"name"=>"tfvs0756.medusa.gobiernodecanarias.net"}, "event"=>{"original"=>"D6929E38-9646-4905-A110-1AFF97A35978,2022-05-02T13: 17:40+00:00,2021-0702T08:54:55+00:00,26843545600,,,,,"}, "user-id"=>"D6929E38-9646-4905-A110-1AFF97A35978", "assigned_quota "=>26843545600, "date"=>"2022-05-02T13:17:40+00:00", "tags"=>["_dateparsefailure"], "last_login_date"=>"2021-0702T08:54:55 +00:00", "log"=>{"file"=>{"path"=>"/etc/logstash/conf.d/nextcloud-report-usage/usage_report_05052022.csv"}}, "number_files"= >nil}], :response=>{"index"=>{"_index"=>"nextcloud-metrics-2022.05.05", "_id"=>"WA57k4ABi-PU5vurZTOO", "status"=>400, " error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [last_login_date] of type [date] in document wi th id 'WA57k4ABi-PU5vurZTOO'. Preview of field's value: '2021-0702T08:54:55+00:00'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2021-0702T08 :54:55+00:00] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}} }}}}

And on the other hand, I have also verified that the cvs file that it takes as input has a total of 149 records, and in the index I only see 111. It is not loading all the data from CVS to ELK.

I have carried out a manual load test of the same file but using ELK's Data Visualizer, I load the file and follow the procedure indicated on the screen, and I have no problem importing all the content.

Below you can see the difference between the index created manually with the data uploaded using the Data Viisualizer and the index created from the pipeline configured in Logstash

yellow open nextcloud-usage-t1 Oc3cVTn2S4euHUDZNMeKhg 1 1 149 0 30.6kb 30.6kb
yellow open nextcloud-metrics-2022.05.05 AecqNQiATwaXCKqsfXHb4g 1 1 111 0 133.4kb 133.4kb

How can I see why the rest of the data is not loaded?

Kind regards and thanks again.

warkolm · May 5, 2022, 10:46pm

That's why.

eduhernandezm · May 6, 2022, 7:01am

Thanks for the answers.

Logically he saw that error, but what I don't understand is how there are records that have been added to the corresponding index, and then some records have this error, when all the records have the same fields between them "last_login_date".

warkolm · May 6, 2022, 7:02am

They may have the records, but the format is not a correct timestamp.

eduhernandezm · May 6, 2022, 11:29am

Ok. I undertand, but I don't know how solve this problem. I have to load dailly a cvs file with data to ELK.

warkolm · May 8, 2022, 10:33pm

You need to fix the timestamp.

eduhernandezm · May 9, 2022, 7:01am

Thanks for the reply.

Couyld you tellme how can i fix the timestamp? I don0t know how to fix it.

Thanks!

warkolm · May 9, 2022, 7:18am

You probably want to look at the source data and see why it's sending wrong info.

eduhernandezm · May 10, 2022, 12:00pm

This is an example of data that I try to load from CVS in logstash to see in Kibana:

user-id,date,last_login_date,assigned_quota,used_quota,number_files,number_shares,number_uploads,number_downloads
admin,2022-05-02T13:17:40+00:00,2022-05-02T13:14:51+00:00,26843545600,23203889,142,2,0,43
alexzolotov,2022-05-02T13:17:40+00:00,2022-0401T16:44:15+00:00,26843545600,22941339,44,0,40,340
apiuser,2022-05-02T13:17:40+00:00,2022-05-02T13:01:07+00:00,26843545600,22869957,36,0,0,6
guille,2022-05-02T13:17:40+00:00,2022-03-17T10:56:49+00:00,26843545600,22869956,35,0,0,0
u_38999997_pro4,2022-05-02T13:17:40+00:00,2022-0409T07:09:57+00:00,26843545600,22941339,35,0,33,4
ext-jacozam,2022-05-02T13:17:40+00:00,2022-05-02T10:48:34+00:00,26843545600,386254691,68,19,7,194
ext-ffebarm,2022-05-02T13:17:40+00:00,2022-05-02T07:44:09+00:00,26843545600,384652,185,262,10,211
ext-ddahper,2022-05-02T13:17:40+00:00,2022-0409T08:30:16+00:00,26843545600,17040873,35,0,0,0
ext-hpercej,2022-05-02T13:17:40+00:00,2022-05-02T10:58:26+00:00,26843545600,18668177,37,2,0,6

In the log of logstash when read the csv file, the warn advise that I see is over the row for example "admin",that see

[2022-05-10T11:14:26,021][WARN ][logstash.outputs.elasticsearch][nextcloud-usage][93ab60cee89ed4e28db11de1102c5e01a66e6547e64eb0bab77110933bf027c3] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"nextcloud-metrics-2022.05.10", :routing=>nil}, {"last_login_date"=>"2022-05-02T10:48:34+00:00", "log"=>{"file"=>{"path"=>"/etc/logstash/conf.d/nextcloud-report-usage/usage_report_10Mayo2022.csv"}}, "date"=>"2022-05-02T13:17:40+00:00", "number_files"=>68, "number_shares"=>19, "@timestamp"=>2022-05-10T11:14:25.307689Z, "assigned_quota"=>26843545600, "host"=>{"name"=>"tfvs0756.medusa.gobiernodecanarias.net"}, "tags"=>["_dateparsefailure"], "number_downloads"=>194, "number_uploads"=>7, "user-id"=>"admin", "used_quota"=>386254691, "event"=>{"original"=>"admin,2022-05-02T13:17:40+00:00,2022-05-02T10:48:34+00:00,26843545600,386254691,68,19,7,194"}, "@version"=>"1"}], :response=>{"index"=>{"_index"=>"nextcloud-metrics-2022.05.10", "_id"=>"ZjOtrYABi-PU5vurbePW", "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper [date] cannot be changed from type [text] to [date]"}}}}

I don't know why for ones data no error but for other there are warn and not load all datas in a index.

Thanks!

Badger · May 10, 2022, 4:17pm

I have not tested this in elasticsearch, so this is speculation. If the first document that is indexed has the [date] field set to "date" (i.e. the csv header arrives first) then unless you have a template defined the [date] field type will be set to "text". If a later document arrives with a [date] field that looks like a timestamp then elasticsearch might (this is the speculation bit) use date detection to try to make the [date] field a "date", resulting in that mapping exception.

Since logstash does not (by default) maintain the order of events I would expect this to be intermittent. The solution would be to drop the header or create a template to set the field type.

system · June 7, 2022, 4:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash configuration returns different results Logstash	15	1471	August 13, 2019
Date matching Logstash	13	614	December 1, 2019
Logstash can't start Logstash elastic-stack-monitoring , docker , ilm-index-lifecycle-management	4	532	October 23, 2020
Logstash csv output plugin problem Logstash	9	1191	August 11, 2020
Logstash does not load data into elastic Kibana elastic-stack-monitoring	10	107	August 9, 2024

Incorrectly collected data in Logstash to Elasticsearch

Related topics