Processors > convert not working when the string includes whitespace

Hello.

I am setting up the filebeat.
I want to convert the string to double and it has whitespace.
I was trying it with dissect. but it's not fit my situation. because that string has not only have numbers. so I've thought it can be possible when after trimming the string.
But it not works.
I think the "convert" processor is not working when there's white space in front of numbers.
Is there any solution to using processors?

Thanks

Can you please provide sample strings that you are trying to convert?

this is the sample string .
  12.972121ms
                 6.7µs
I wanna get only double numbers without string and whitespace.

Where are the white spaces leading, trailing, both? Do you want the units in another field?

leading.
I've modified my reply. you can see the whitespaces leading.
I wanna extract the floating-point number.

One more question how are parsing the original data to get these fields? Are they already coming in json or did you parse a message field.

If you are parsing the message field perhaps share a sample message and your parsing.

Oh and @makeajourney welcome to the community!

Thanks for welcomming!

log samples are here

[tKMgPW4Y9Vxsi4aBdYxm] 2021/03/31 - 13:11:43 | 200 |         6.7µs |   168.63.129.16 | GET      "/health"
[Pq12tolrHWPmZ2uESadt] 2021/03/31 - 13:11:44 | 200 |   12.972121ms |      10.0.62.82 | POST     "/api/v1/trip/event"

and string for dissect processor

'[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code|integer} | %{response-time} | %{source-ip} | %{method} "%{uri}"'

I got these response time into response-time field with dissect processor.

Here is one way...
You will probably want to create a mapping ahead of time, not sure if you got there already.

PUT _ingest/pipeline/test-pipeline
{
  "processors": [
    {
      "dissect": {
        "field": "message",
        "pattern": "[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code} | %{response-time-field} | %{source-ip} | %{method} \"%{uri}\""
      }
    },
    {
      "grok": {
        "field": "response-time-field",
        "patterns": [ "%{SPACE}%{NUMBER:response-time:float}%{NOTSPACE:response-time-units}"]
      }
    }
  ]
}

POST /_ingest/pipeline/test-pipeline/_simulate
{
  "docs": [
    {
      "_index": "test",
      "_id": "ySWv2XcBYgpFxvFAgAvO",
      "_source": {
        "timeStamp": "2021-02-25T11:55:33.3922395Z",
        "message": "[tKMgPW4Y9Vxsi4aBdYxm] 2021/03/31 - 13:11:43 | 200 |         6.7µs |   168.63.129.16 | GET      \"/health\""
      }
    }
  ]
}

Results, not the response-time is already a float. :slight_smile: The grok did that.
Also if you create a mapping before have you may not need to convert as that will / can happen when the

{
  "docs" : [
    {
      "doc" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "ySWv2XcBYgpFxvFAgAvO",
        "_source" : {
          "method" : "GET     ",
          "response-time-units" : "µs",
          "year" : "2021",
          "response-time" : 6.7,
          "message" : "[tKMgPW4Y9Vxsi4aBdYxm] 2021/03/31 - 13:11:43 | 200 |         6.7µs |   168.63.129.16 | GET      \"/health\"",
          "request-id" : "tKMgPW4Y9Vxsi4aBdYxm",
          "uri" : "/health",
          "response-time-field" : "        6.7µs",
          "timeStamp" : "2021-02-25T11:55:33.3922395Z",
          "status-code" : "200",
          "month" : "03",
          "source-ip" : "  168.63.129.16",
          "time" : "13:11:43",
          "day" : "31"
        },
        "_ingest" : {
          "timestamp" : "2021-04-05T02:40:12.2048538Z"
        }
      }
    }
  ]
}

BTW using dissect upfront is good it is efficient

Is grok possible to output:file?
Unfortunately I am not using elk as output.
I'm looking for the way without elk.

Anyway, thanks for your helping.

So you are using Logstash?... Yes the same will basically work in Logstash. Dissect then grok... Or are you trying to do all this in Filebeat?

I am not using Logstash. I wanna make the output a File directly.

I've tried as following

processors:
  - dissect:
      tokenizer: '[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code|integer} | %{response-time} | %{source-ip} | %{method->} "%{uri}"'
      field: "message"
      target_prefix: "output"
  - if:
      contains.output.response-time: "µs"
    then:
      - add_fields:
          target: output
          fields:
            response-time: "0"
    else:
      - truncate_fields:
          fields:
            - output.response-time
          max_characters: 11
      - convert:
          fields:
            - {from: "output.response-time", to: "output.response-time", type: "double"}

and it's not working as I mentioned before.

Ok I think I understand what you are trying to accomplish. (Filebeat -> Output.txt) yes, tell us that next time it will help shorten the Q/A cycle.

Filebeat is not necessarily designed to be a full fledged parser as it is designed to be an lightweight shipper let me take a look and see.

And yes I am not surprised the above is not working.

No grok is not available in Filebeat.

If you used logstash you could get exactly what you want.

Right Now, About the best I have for you right now with filebeat is:

processors:
  - dissect:
      tokenizer: '[%{request-id}] %{year}/%{month}/%{day} - %{time} | %{status-code} | %{response-time} | %{source-ip} | %{method} "%{uri}"'
      field: "message"
      target_prefix: "output"
      trim_values: "all"
  - convert:
        fields:
        - {from: "output.status-code", to: "output.status-code", type: "integer"}
  - if:
      contains.output.response-time: "µs"
    then:
      - add_fields:
          target: output
          fields:
            response-time: "0ms"

well, I've just thought that filebeat can be a lightweight shipper also reformatter at the same time. And I think that filebeat is already good to reformat in some ways.
I have an additional question. when converting a field to another type, trimming the whitespace in string can be supported in near future?
I think that will be helpful in this case.

Perhaps...but I do not have that level of insight to the roadmap.
Please feel free to open a feature request.
Today you could do this with
input -> logstash -> output file
or
input -> filebeat (many) -> logstash -> output file(s)

Could you tell me where can i open the feature request?

You can open a feature request here

Thank you :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.