Processors > convert not working when the string includes whitespace

makeajourney · April 5, 2021, 12:36am

Hello.

I am setting up the filebeat.
I want to convert the string to double and it has whitespace.
I was trying it with dissect. but it's not fit my situation. because that string has not only have numbers. so I've thought it can be possible when after trimming the string.
But it not works.
I think the "convert" processor is not working when there's white space in front of numbers.
Is there any solution to using processors?

Thanks

stephenb · April 5, 2021, 1:26am

Can you please provide sample strings that you are trying to convert?

makeajourney · April 5, 2021, 1:46am

this is the sample string .
12.972121ms
6.7µs
I wanna get only double numbers without string and whitespace.

stephenb · April 5, 2021, 1:50am

Where are the white spaces leading, trailing, both? Do you want the units in another field?

makeajourney · April 5, 2021, 1:53am

leading.
I've modified my reply. you can see the whitespaces leading.
I wanna extract the floating-point number.

stephenb · April 5, 2021, 1:59am

One more question how are parsing the original data to get these fields? Are they already coming in json or did you parse a message field.

If you are parsing the message field perhaps share a sample message and your parsing.

Oh and @makeajourney welcome to the community!

makeajourney · April 5, 2021, 2:03am

Thanks for welcomming!

log samples are here

[tKMgPW4Y9Vxsi4aBdYxm] 2021/03/31 - 13:11:43 | 200 |         6.7µs |   168.63.129.16 | GET      "/health"
[Pq12tolrHWPmZ2uESadt] 2021/03/31 - 13:11:44 | 200 |   12.972121ms |      10.0.62.82 | POST     "/api/v1/trip/event"

and string for dissect processor

'[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code|integer} | %{response-time} | %{source-ip} | %{method} "%{uri}"'

I got these response time into response-time field with dissect processor.

stephenb · April 5, 2021, 2:42am

Here is one way...
You will probably want to create a mapping ahead of time, not sure if you got there already.

PUT _ingest/pipeline/test-pipeline
{
  "processors": [
    {
      "dissect": {
        "field": "message",
        "pattern": "[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code} | %{response-time-field} | %{source-ip} | %{method} \"%{uri}\""
      }
    },
    {
      "grok": {
        "field": "response-time-field",
        "patterns": [ "%{SPACE}%{NUMBER:response-time:float}%{NOTSPACE:response-time-units}"]
      }
    }
  ]
}

POST /_ingest/pipeline/test-pipeline/_simulate
{
  "docs": [
    {
      "_index": "test",
      "_id": "ySWv2XcBYgpFxvFAgAvO",
      "_source": {
        "timeStamp": "2021-02-25T11:55:33.3922395Z",
        "message": "[tKMgPW4Y9Vxsi4aBdYxm] 2021/03/31 - 13:11:43 | 200 |         6.7µs |   168.63.129.16 | GET      \"/health\""
      }
    }
  ]
}

Results, not the response-time is already a float. The grok did that.
Also if you create a mapping before have you may not need to convert as that will / can happen when the

{
  "docs" : [
    {
      "doc" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "ySWv2XcBYgpFxvFAgAvO",
        "_source" : {
          "method" : "GET     ",
          "response-time-units" : "µs",
          "year" : "2021",
          "response-time" : 6.7,
          "message" : "[tKMgPW4Y9Vxsi4aBdYxm] 2021/03/31 - 13:11:43 | 200 |         6.7µs |   168.63.129.16 | GET      \"/health\"",
          "request-id" : "tKMgPW4Y9Vxsi4aBdYxm",
          "uri" : "/health",
          "response-time-field" : "        6.7µs",
          "timeStamp" : "2021-02-25T11:55:33.3922395Z",
          "status-code" : "200",
          "month" : "03",
          "source-ip" : "  168.63.129.16",
          "time" : "13:11:43",
          "day" : "31"
        },
        "_ingest" : {
          "timestamp" : "2021-04-05T02:40:12.2048538Z"
        }
      }
    }
  ]
}

BTW using dissect upfront is good it is efficient

makeajourney · April 5, 2021, 2:52am

Is grok possible to output:file?
Unfortunately I am not using elk as output.
I'm looking for the way without elk.

Anyway, thanks for your helping.

stephenb · April 5, 2021, 2:59am

So you are using Logstash?... Yes the same will basically work in Logstash. Dissect then grok... Or are you trying to do all this in Filebeat?

makeajourney · April 5, 2021, 3:20am

I am not using Logstash. I wanna make the output a File directly.

I've tried as following

processors:
  - dissect:
      tokenizer: '[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code|integer} | %{response-time} | %{source-ip} | %{method->} "%{uri}"'
      field: "message"
      target_prefix: "output"
  - if:
      contains.output.response-time: "µs"
    then:
      - add_fields:
          target: output
          fields:
            response-time: "0"
    else:
      - truncate_fields:
          fields:
            - output.response-time
          max_characters: 11
      - convert:
          fields:
            - {from: "output.response-time", to: "output.response-time", type: "double"}

and it's not working as I mentioned before.

stephenb · April 5, 2021, 3:36am

Ok I think I understand what you are trying to accomplish. (Filebeat -> Output.txt) yes, tell us that next time it will help shorten the Q/A cycle.

Filebeat is not necessarily designed to be a full fledged parser as it is designed to be an lightweight shipper let me take a look and see.

And yes I am not surprised the above is not working.

stephenb · April 5, 2021, 4:46am

No grok is not available in Filebeat.

If you used logstash you could get exactly what you want.

Right Now, About the best I have for you right now with filebeat is:

processors:
  - dissect:
      tokenizer: '[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code} | %{response-time} | %{source-ip} | %{method} "%{uri}"'
      field: "message"
      target_prefix: "output"
      trim_values: "all"
  - convert:
        fields:
        - {from: "output.status-code", to: "output.status-code", type: "integer"}
  - if:
      contains.output.response-time: "µs"
    then:
      - add_fields:
          target: output
          fields:
            response-time: "0ms"

makeajourney · April 5, 2021, 4:48am

well, I've just thought that filebeat can be a lightweight shipper also reformatter at the same time. And I think that filebeat is already good to reformat in some ways.
I have an additional question. when converting a field to another type, trimming the whitespace in string can be supported in near future?
I think that will be helpful in this case.

stephenb · April 5, 2021, 4:51am

Perhaps...but I do not have that level of insight to the roadmap.
Please feel free to open a feature request.
Today you could do this with
input -> logstash -> output file
or
input -> filebeat (many) -> logstash -> output file(s)

makeajourney · April 5, 2021, 4:54am

Could you tell me where can i open the feature request?

stephenb · April 5, 2021, 4:58am

You can open a feature request here

makeajourney · April 5, 2021, 4:59am

Thank you

system · May 3, 2021, 7:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.