Logstash with elasticsearch input and output keep looping results

(Layla) #1

I would like to reindex and filter my log again. What I get the information from Internet is using the logstash to filter the data again. I tried and it can really split my data into different fields, however, the data keeps looping. That is, I have 100,000 log but after filter and output to elasticsearch, I found that more than 100,000 log output into elasticsearch and the log are deplicated. Does anyone have idea on that?

Moreover, I receive below log when running logstash, although it said that error phasing JSON, I found that the log can still be filtered. Why would be like that? Thank you!

Here is my logstash config:

input {
      elasticsearch {
      hosts => ""
      index => "logstash-2018.01.04"
     grok {
            match => {"message" => "%{TIMESTAMP_ISO8601:logdate} %{GREEDYDATA:vmname}  %{GREEDYDATA:message}"}
            overwrite => [ "message" ]
filter {
       json {
            source => "scrmsg"
output {
       elasticsearch {
            hosts => [""]
            manage_template => false
            index => "logstash-2018.01.04-1"

Here is the error log:

[2018-01-11T15:15:32,010][WARN ][logstash.filters.json    ] Error parsing json {:source=>"scrmsg", :raw=>"Trident/5.0)\",\"geoip_country\":\"US\",\"allowed\":\"1\",\"threat_score\":\"268435456\",\"legacy_unique_id\":\"\",\"cache_status\":\"-\",\"informed_id\":\"\",\"primitive_id\":\"2BC2D8AD-7AD0-3CAD-9453-B0335F409701\",\"valid_ajax\":\"0\",\"orgin_response_time\":\"0.081\",\"request_id\":\"cd2ae0a8-0921-48b6-b03f-15c71a55100b\",\"bytes_returned_origin\":\"83\",\"server_ip\":\"\",\"origin_status_code\":\"\",\"calculated_pages_per_min\":\"1\",\"calculated_pages_per_session\":\"1\",\"calculated_session_length\":\"0\",\"k_s\":\"\",\"origin_address\":\"\",\"request_protocol\":\"https\",\"server_serial\":\"5c3eb4ad-3799-4bd8-abb2-42edecd54b99\",\"nginx_worker_process\":\"19474\",\"origin_content_type\":\"application/json;charset=UTF-8\",\"lb_request_time\":\"\",\"SID\":\"\",\"geoip_org\":\"Drake Holdings LLC\",\"accept\":\"*/*\",\"accept_encoding\":\"gzip, deflate\",\"accept_language\":\"\",\"connection\":\"Keep-Alive\",\"http_request_length\":\"418\",\"real_ip_header_value\":\"\",\"http_host\":\"www.honeyworkshop.com\",\"machine_learning_score\":\"\",\"HSIG\":\"ALE_UHCF\",\"ZID\":\"\",\"ZUID\":\"\",\"datacenter_id\":\"363\",\"new_platform_domain_id\":\"3063fc0b-5b48-4413-9bc7-600039caf64c\",\"whitelist_score\":\"0\",\"billable\":\"1\",\"distil_action\":\"@proxy\",\"js_additional_threats\":\"\",\"js_kv_additional_threats\":\"\",\"re_field_1\":\"\",\"re_field_2\":\"\",\"re_field_3\":\"\",\"http_accept_charset\":\"\",\"sdk_token_id\":\"\",\"sdk_application_instance_id\":\"\",\"per_path_calculated_pages_per_minute\":\"1\",\"per_path_calculated_pages_per_session\":\"1\",\"path_security_type\":\"api\",\"identification_provider\":\"web\",\"identifier_record_pointer\":\"\",\"identifier_record_value\":\"\",\"path_rule_scope_id\":\"\",\"experiment_id\":\"0\",\"experiment_score\":\"\",\"experiment_group_id\":\"\",\"experiment_auxiliary_string\":\"\",\"type\":\"distil\"}\n", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Trident': was expecting ('true', 'false' or 'null') at [Source: (byte[])"Trident/5.0)","geoip_country":"US","allowed":"1","threat_score":"268435456","legacy_unique_id":"","cache_status":"-","informed_id":"","primitive_id":"2BC2D8AD-7AD0-3CAD-9453-B0335F409701","valid_ajax":"0","orgin_response_time":"0.081","request_id":"cd2ae0a8-0921-48b6-b03f-15c71a55100b","bytes_returned_origin":"83","server_ip":"","origin_status_code":"","calculated_pages_per_min":"1","calculated_pages_per_session":"1","calculated_session_length":"0","k_s":"","origin_address":"10.0.10"[truncated 1180 bytes]; line: 1, column: 9]>}

Batch processing
(David Pilato) #2

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:


I moved your question to #logstash

(Layla) #3

Thanks for reminder!!

(Guy Boertje) #4

It looks like the value of the scrmsg is not valid JSON.
It starts without a leading { bracket.

(Layla) #5

Thank you for your reminder. I have found that some log was error phasing by JSON but some are not. I correct the logstash indexer again. But will it affect the elasticsearch loops the index input?

(Guy Boertje) #6

By looping, I think you mean that the ES input re-reads the docs that the ES output adds to ES. Yes?

If so, I thought that as the input and output are using different indexes it should not loop.

@Christian_Dahlqvist - please comment on this ^.

(Layla) #7

Yes, it re-reads the docs. I already use different index name, and the host is different too.

(Layla) #8

Anyone can help? :tired_face:

(Ovidiu Balaban) #9

I have the same issue with Logstash 6.1.1. Regardless if I use it on the same host or different hosts, Logstash loops until stopped.

I didn't use Kibana, I used curl directly on the Elasticsearch indices.

(Guy Boertje) #10


Please try with very different index names (no possible overlap) e.g. ES input logstash-2018.01.04 and ES output logstash-1-2018.01.04. Does it still loop?

(Ovidiu Balaban) #11

@guyboertje my indexes are as different as notag_anxl00-18 and anxl-0018 and I still have the looping issue.

(Guy Boertje) #12

I asked one of our Consultants and they said...

I used that approach a while ago and did not have issues. One thing that comes to mind is to increase the scroll time to a higher value. Perhaps logstash can't process the events fast enough and starts over?

scroll defaults to "1m".

(Ovidiu Balaban) #13

Thank you, @guyboertje.

I have increased the scroll value to 5m for an ongoing operation. I'll let people know if this changes anything.

(Layla) #14

May I know how to set the scroll value? Thanks @guyboertje!

(Layla) #15

How's you result after tuning the scroll value? :slightly_smiling_face:

(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.