Pipeline Grok Expressions

I have a logline :
172.17.60.26 - - [14/Feb/2019:01:50:47 -0000] "GET http://live-paytv.mobitv.com/mm/dash/live/19033/LIVESERVICE_2403/V5000_W/258351504.m4s http/1.1" 200 3696282 200 3696282 0 0 591 579 674 566 1.695 1.598 DIRECT FIN FIN TCP_MISS "MOBI_EXO2Player;Dalvik/2.1.0 (Linux; U; Android 5.1.1; AFTT Build/LVY48F)" 133ba155-2f5a-4c64-8426-465286558c46

I am using the following grok expression :

{
      "grok": {
        "field": "message",
        "patterns": ["""%{IP:source_ip} %{GREEDYDATA} \[%{HTTPDATE:request_date}\] \"%{WORD:http_method} %{URIPROTO:http_proto}://%{URIHOST:uri_host}%{URIPATH:uri_path}%{GREEDYDATA:uri_query} http/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:bytes_sent:int} %{NUMBER:origin_response_code} %{NUMBER:origin_bytes_sent} %{NUMBER:client_req_content_length} %{NUMBER:proxy_req_length} %{NUMBER:client_req_header_length} %{NUMBER:proxy_resp_header_length} %{NUMBER:proxy_req_header_length} %{NUMBER:origin_header_resp_length} %{NUMBER:time_to_serve:} %{NUMBER:origin_time_to_serve:} %{WORD:proxy_hierarchy_route} %{WORD:finish_status_client} %{WORD:finish_status_origin} %{WORD:cache_result_code} \"%{GREEDYDATA:user_agent}\" %{GREEDYDATA:x_play_back_session_id}""",
        """%{IP:source_ip} %{GREEDYDATA} \[%{HTTPDATE:request_date}\] \"%{WORD:http_method} %{URIPROTO:http_proto}://%{URIHOST:uri_host}%{URIPATH:uri_path}%{GREEDYDATA:uri_query} http/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:bytes_sent:int} %{NUMBER:origin_response_code} %{NUMBER:origin_bytes_sent:int} %{NUMBER:client_req_content_length} %{NUMBER:proxy_req_length} %{NUMBER:client_req_header_length} %{NUMBER:proxy_resp_header_length} %{NUMBER:proxy_req_header_length} %{NUMBER:origin_header_resp_length} %{NUMBER:time_to_serve:} %{NUMBER:origin_time_to_serve:} %{WORD:proxy_hierarchy_route} %{WORD:finish_status_client} %{WORD:finish_status_origin} %{WORD:cache_result_code} %{GREEDYDATA:user_agent}
        """]
      }
    } 

When I use this grok processor using PUT PIPELINE API in Kibana Dev Console, this works fine. But when I put the grok processor in a json file and trigger it using the command line :"curl -H 'Content-Type: application/json' -X PUT 'http://localhost:9200/_ingest/pipeline/trial-pipeline' -d@pipeline.json", it gives an error

{"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to parse content to map"}],"type":"parse_exception","reason":"Failed to parse content to map","caused_by":{"type":"json_parse_exception","reason":"Unrecognized character escape '[' (code 91)\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@3ae9f3ad; line: 1, column: 170]"}},"status":400}.

Could you try to replace """ by " and every internal " by \"?

Hey David, sorry about the topic. I forgot to put the json in code format. I have edited it. You can see the actual code in the topic now.

Could you share your full pipeline.json file that reproduces this problem so we can start from it to reproduce?

This is the pipeline.json file.

{
  "description" : "Pipeline for ingest node",
  "processors" : [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{IP:source_ip} %{GREEDYDATA} \[%{HTTPDATE:request_date}\] \"%{WORD:http_method} %{URIPROTO:http_proto}://%{URIHOST:uri_host}%{URIPATH:uri_path}%{GREEDYDATA:uri_query} http/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:bytes_sent:int} %{NUMBER:origin_response_code} %{NUMBER:origin_bytes_sent} %{NUMBER:client_req_content_length} %{NUMBER:proxy_req_length} %{NUMBER:client_req_header_length} %{NUMBER:proxy_resp_header_length} %{NUMBER:proxy_req_header_length} %{NUMBER:origin_header_resp_length} %{NUMBER:time_to_serve:} %{NUMBER:origin_time_to_serve:} %{WORD:proxy_hierarchy_route} %{WORD:finish_status_client} %{WORD:finish_status_origin} %{WORD:cache_result_code} \"%{GREEDYDATA:user_agent}\" %{GREEDYDATA:x_play_back_session_id}",
        "%{IP:source_ip} %{GREEDYDATA} \[%{HTTPDATE:request_date}\] \"%{WORD:http_method} %{URIPROTO:http_proto}://%{URIHOST:uri_host}%{URIPATH:uri_path}%{GREEDYDATA:uri_query} http/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:bytes_sent:int} %{NUMBER:origin_response_code} %{NUMBER:origin_bytes_sent:int} %{NUMBER:client_req_content_length} %{NUMBER:proxy_req_length} %{NUMBER:client_req_header_length} %{NUMBER:proxy_resp_header_length} %{NUMBER:proxy_req_header_length} %{NUMBER:origin_header_resp_length} %{NUMBER:time_to_serve:} %{NUMBER:origin_time_to_serve:} %{WORD:proxy_hierarchy_route} %{WORD:finish_status_client} %{WORD:finish_status_origin} %{WORD:cache_result_code} %{GREEDYDATA:user_agent}"]
      }
    },
    
    {
      "convert" : {
        "field" : "bytes_sent",
        "type": "integer"
      }
    },
    {
          "dissect": {
            "field": "uri_path",
           "if":"(ctx.uri_path.contains('hls5') && ctx.uri_path.contains('live') && (ctx.uri_path.contains('m3u8') || ctx.uri_path.contains('ts'))) || (ctx.uri_path.contains('dash') && ctx.uri_path.contains('live') && ctx.uri_path.contains('m4s'))",
            "pattern": "/%{a}/%{protocol}/%{stream_type}/%{backend_channel_id}/%{e}/%{variant}/%{g}.%{h}"
          }
        },
        
        
        
        {
  "remove": {
    "field": ["a","e","g","h"]
  }
}      
 ]
}

Try with:

{
  "description": "Pipeline for ingest node",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{IP:source_ip} %{GREEDYDATA} \\[%{HTTPDATE:request_date}\\] \\\"%{WORD:http_method} %{URIPROTO:http_proto}://%{URIHOST:uri_host}%{URIPATH:uri_path}%{GREEDYDATA:uri_query} http/%{NUMBER:http_version}\\\" %{NUMBER:response_code} %{NUMBER:bytes_sent:int} %{NUMBER:origin_response_code} %{NUMBER:origin_bytes_sent} %{NUMBER:client_req_content_length} %{NUMBER:proxy_req_length} %{NUMBER:client_req_header_length} %{NUMBER:proxy_resp_header_length} %{NUMBER:proxy_req_header_length} %{NUMBER:origin_header_resp_length} %{NUMBER:time_to_serve:} %{NUMBER:origin_time_to_serve:} %{WORD:proxy_hierarchy_route} %{WORD:finish_status_client} %{WORD:finish_status_origin} %{WORD:cache_result_code} \\\"%{GREEDYDATA:user_agent}\\\" %{GREEDYDATA:x_play_back_session_id}",
          "%{IP:source_ip} %{GREEDYDATA} \\[%{HTTPDATE:request_date}\\] \\\"%{WORD:http_method} %{URIPROTO:http_proto}://%{URIHOST:uri_host}%{URIPATH:uri_path}%{GREEDYDATA:uri_query} http/%{NUMBER:http_version}\\\" %{NUMBER:response_code} %{NUMBER:bytes_sent:int} %{NUMBER:origin_response_code} %{NUMBER:origin_bytes_sent:int} %{NUMBER:client_req_content_length} %{NUMBER:proxy_req_length} %{NUMBER:client_req_header_length} %{NUMBER:proxy_resp_header_length} %{NUMBER:proxy_req_header_length} %{NUMBER:origin_header_resp_length} %{NUMBER:time_to_serve:} %{NUMBER:origin_time_to_serve:} %{WORD:proxy_hierarchy_route} %{WORD:finish_status_client} %{WORD:finish_status_origin} %{WORD:cache_result_code} %{GREEDYDATA:user_agent}"
        ]
      }
    },
    {
      "convert": {
        "field": "bytes_sent",
        "type": "integer"
      }
    },
    {
      "dissect": {
        "field": "uri_path",
        "if": "(ctx.uri_path.contains(\"hls5\") && ctx.uri_path.contains(\"live\") && (ctx.uri_path.contains(\"m3u8\") || ctx.uri_path.contains(\"ts\"))) || (ctx.uri_path.contains(\"dash\") && ctx.uri_path.contains(\"live\") && ctx.uri_path.contains(\"m4s\"))",
        "pattern": "/%{a}/%{protocol}/%{stream_type}/%{backend_channel_id}/%{e}/%{variant}/%{g}.%{h}"
      }
    },
    {
      "remove": {
        "field": [
          "a",
          "e",
          "g",
          "h"
        ]
      }
    }
  ]
}

The grok passed, but I am seeing this error:

org.elasticsearch.ElasticsearchException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: field [a] not present as part of path [a]

Also, seeing this error:

ERROR pipeline/output.go:121 Failed to publish events: temporary bulk send failure

That's other questions. I' d open another question as the original one is now solved.

Okay, thanks, David!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.