I would like to reindex and filter my log again. What I get the information from Internet is using the logstash to filter the data again. I tried and it can really split my data into different fields, however, the data keeps looping. That is, I have 100,000 log but after filter and output to elasticsearch, I found that more than 100,000 log output into elasticsearch and the log are deplicated. Does anyone have idea on that?
Moreover, I receive below log when running logstash, although it said that error phasing JSON, I found that the log can still be filtered. Why would be like that? Thank you!
Here is my logstash config:
input {
elasticsearch {
hosts => "10.0.132.56"
index => "logstash-2018.01.04"
}
}
filter{
grok {
match => {"message" => "%{TIMESTAMP_ISO8601:logdate} %{GREEDYDATA:vmname} %{GREEDYDATA:message}"}
overwrite => [ "message" ]
}
}
filter {
json {
source => "scrmsg"
}
}
output {
elasticsearch {
hosts => ["10.0.132.64:9200"]
manage_template => false
index => "logstash-2018.01.04-1"
}
}
Here is the error log:
[2018-01-11T15:15:32,010][WARN ][logstash.filters.json ] Error parsing json {:source=>"scrmsg", :raw=>"Trident/5.0)\",\"geoip_country\":\"US\",\"allowed\":\"1\",\"threat_score\":\"268435456\",\"legacy_unique_id\":\"\",\"cache_status\":\"-\",\"informed_id\":\"\",\"primitive_id\":\"2BC2D8AD-7AD0-3CAD-9453-B0335F409701\",\"valid_ajax\":\"0\",\"orgin_response_time\":\"0.081\",\"request_id\":\"cd2ae0a8-0921-48b6-b03f-15c71a55100b\",\"bytes_returned_origin\":\"83\",\"server_ip\":\"10.0.10.16\",\"origin_status_code\":\"\",\"calculated_pages_per_min\":\"1\",\"calculated_pages_per_session\":\"1\",\"calculated_session_length\":\"0\",\"k_s\":\"\",\"origin_address\":\"10.0.10.16:443\",\"request_protocol\":\"https\",\"server_serial\":\"5c3eb4ad-3799-4bd8-abb2-42edecd54b99\",\"nginx_worker_process\":\"19474\",\"origin_content_type\":\"application/json;charset=UTF-8\",\"lb_request_time\":\"\",\"SID\":\"\",\"geoip_org\":\"Drake Holdings LLC\",\"accept\":\"*/*\",\"accept_encoding\":\"gzip, deflate\",\"accept_language\":\"\",\"connection\":\"Keep-Alive\",\"http_request_length\":\"418\",\"real_ip_header_value\":\"204.79.180.18\",\"http_host\":\"www.honeyworkshop.com\",\"machine_learning_score\":\"\",\"HSIG\":\"ALE_UHCF\",\"ZID\":\"\",\"ZUID\":\"\",\"datacenter_id\":\"363\",\"new_platform_domain_id\":\"3063fc0b-5b48-4413-9bc7-600039caf64c\",\"whitelist_score\":\"0\",\"billable\":\"1\",\"distil_action\":\"@proxy\",\"js_additional_threats\":\"\",\"js_kv_additional_threats\":\"\",\"re_field_1\":\"\",\"re_field_2\":\"\",\"re_field_3\":\"\",\"http_accept_charset\":\"\",\"sdk_token_id\":\"\",\"sdk_application_instance_id\":\"\",\"per_path_calculated_pages_per_minute\":\"1\",\"per_path_calculated_pages_per_session\":\"1\",\"path_security_type\":\"api\",\"identification_provider\":\"web\",\"identifier_record_pointer\":\"\",\"identifier_record_value\":\"\",\"path_rule_scope_id\":\"\",\"experiment_id\":\"0\",\"experiment_score\":\"\",\"experiment_group_id\":\"\",\"experiment_auxiliary_string\":\"\",\"type\":\"distil\"}\n", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'Trident': was expecting ('true', 'false' or 'null') at [Source: (byte[])"Trident/5.0)","geoip_country":"US","allowed":"1","threat_score":"268435456","legacy_unique_id":"","cache_status":"-","informed_id":"","primitive_id":"2BC2D8AD-7AD0-3CAD-9453-B0335F409701","valid_ajax":"0","orgin_response_time":"0.081","request_id":"cd2ae0a8-0921-48b6-b03f-15c71a55100b","bytes_returned_origin":"83","server_ip":"10.0.10.16","origin_status_code":"","calculated_pages_per_min":"1","calculated_pages_per_session":"1","calculated_session_length":"0","k_s":"","origin_address":"10.0.10"[truncated 1180 bytes]; line: 1, column: 9]>}