Import CSV with date fields not usable as timestamp field

Hello,

I'm currently try to use a date fields imported form a CVS via filebeat to the elasticsearch. (v 7.17.10)
The field have this format : "2023-03-07 15:19:11" and I would like to use it as timestamp field.

I tried many different ways in filebeat.yml, but whitout succes.

e.g :

setup.template.append_fields:
- name: date2
  type: date
  format: "yyyy-MM-dd' 'HH:mm"

or


processors:
  - timestamp:
      field: "date2"
      target_filed: date2
      layouts:
      - 'yyyy-dd-MM HH:mm'

When the index is created, the date fields is nerver create as type "date" but stay "string".

Curiously, in this CSV I have another field with date like "2023-03-07". This field is correctly set as date type without any configuration in filebeat.yml

Could provide me some advises, please.

Best regards,

V.

Could you try with this syntax?

processors:
  - timestamp:
      field: date
      formats:
        - "yyyy-MM-dd HH:mm:ss"
      ignore_failure: true
      tag_on_failure: ["_dateparsefailure"]

I am assuming you want to use it as the @timestamp field...
Make sure you check if a timezone is needed. Then you could try it like this:

processors:
  - timestamp:
      field: date
      formats:
        - "yyyy-MM-dd HH:mm:ss"
      timezone: "UTC"
      ignore_failure: true
      tag_on_failure: ["_dateparsefailure"]

If you do not want to use it as the @timestamp field then you could try the below example, with our without the timezone notation.

processors:
  - date:
      field: date
      target_field: date
      formats:
        - "yyyy-MM-dd HH:mm:ss"
      ignore_failure: true
      tag_on_failure: ["_dateparsefailure"]

With the first configuration.
An error message on startup :
failed to unpack the timestamp configuration: missing required field accessing 'processors.1.timestamp.layouts'

So I add layouts config:

 - timestamp:
      field: date
      formats:
        - "yyyy-MM-dd HH:mm:s
      layouts:
        - "yyyy-MM-dd HH:mm:s
      ignore_failure: true
      tag_on_failure: ["_dateparsefailure"]

And I have this error message :

(status=400): 
{"type":"mapper_parsing_exception","reason":"failed to parse field [date] of type [date] in document with id '5-YqGogB-x02cX94I8XU'. Preview of field's value: '2023-03-07 15:19:11'","caused_by":{"type":"illegal_argument_exception","reason":"failed to parse date field [2023-03-07 15:19:11] with format [strict_date_optional_time||epoch_millis]","caused_by":{"type":"date_time_parse_exception","reason":"date_time_parse_exception: Failed to parse with all enclosed parsers"}}}, dropping event!

Hi @Vortex_SLT Welcome to the community.

Please share a sample of your data (anonymized if needed)
Your entire filebeat.yml

Find below the filebeat.yml file :

- type: log
  enabled: true
  paths:
    - /exports/*_file.csv

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
# ======================= Elasticsearch template setting =======================
setup.ilm.enabled: false
setup.template.name: "index_name"
setup.template.pattern: "index_name-*"
setup.template.settings:
  index.number_of_shards: 6

  #index.codec: best_compression
  #_source.enabled: false
setup.template.overwrite: true
setup.template.append_fields:
#- name: my_date
#  type: date
#- name: my_date2
#  type: date
# - name: effective_time
#   type: date
# - name: date
#   type: date
#   formats: "yyyy-MM-dd HH:mm:ss"


# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  #host: "localhost:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

# ================================== Outputs ===================================
# Configure what output to use when sending the data collected by the beat.
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  hosts: ["node01.domain.me:9200","node02.domain.me:9200","node03.domain.me:9200","node04.domain.me:9200","node05.domain.me:9200","node06.domain.me:9200"]
  protocol: "https"
  username: « beats »
  password: « pass »
  ssl.certificate_authorities: [« /home/ca.pem"]
  index: "index_name-%{+yyyy.MÊME.dd}"


# ================================= Processors =================================
processors:
  - decode_csv_fields:
      fields:
        message: decoded.csv
      separator: ","
      #ignore_missing: false
      #overwrite_keys: true
      #trim_leading_space: false
      #fail_on_error: true
  - timestamp:
      field: date
      formats:
        - "yyyy-MM-dd HH:mm:ss"
      layouts:
        - "yyyy-MM-dd HH:mm:ss"
      ignore_failure: true
      tag_on_failure: ["_dateparsefailure"]
  - extract_array:
      field: decoded.csv
      mappings:
       date3: 0
       date2: 1
       date: 2
       number1: 3
       action: 4
       message1: 5
       ssid: 6
       OS_client: 7
       IP_client: 8
       filename: 9
       full_path_name: 10
       empty: 11
       nas_full_path_file: 12
       extention: 13
       type: 14
       number2: 15
       nas_hostname: 16
       number3: 17
       vnas_hostname: 18
       number4: 19
       vnas_shared: 20
       number5: 21
       vnas_shared2: 22
       protocol: 23
       vers: 24
       OS_NAS: 25
       log_event: 26
       number6: 27
       number7: 28
       effective_time: 29
       SSID_host: 30
       ip1: 31
       ip2: 32
       empty2: 33
       username: 34
  - drop_fields:
      fields: ["message","decoded.csv"]

Find below the input :

1683763230.1440005,"2023-05-11","2023-05-11 00:00:30",144000583,"READ","{""additionalInformation"":{},""blocked"":false,""reason"":null,""accessControlType"":null,""userIgnored"":false,""audited"":true}","S-1-5-21-xxxxx-yyyyyy-1714775081-2916883","WINDOWS","ip1.ip2.ip3.ip4","get_template.ps1","/SBM_scripts/PowershellScripts/vmware/get_template.ps1","","\\hostname01\script$\PowerShellScripts\vmware\get_template.ps1","ps1","FILE",1,"nasnode",402,"hostname01",456,"Arg_SBM_scripts",535,"script$","SMB","3.0","ONTAP","local.cs.fct.file-events.0",5089,2,1683763200,"S-1-5-21-xxxx-yyyyy-1714775081-2916883_ip1.ip2.ip3.ip4","172.16.0.1","10.1.1.1","","username"

Hi @Vortex_SLT

Couple things going on... First and most important Processors are run in order so this would never work because you were trying to do the timestamp before the array was decoded so the date field was never there in the first place.

2nd and equally important the timestamp processor takes a very special format that you are not using... please look at it closely...

Minor I would use filestream not log type going forward, of will be deprecated at some point.

Here is my super simplified version with a file with your sample data

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
    - /Users/sbrown/workspace/sample-data/discuss/discuss-timestamp.csv

setup.kibana:

output.elasticsearch:
  hosts: ["localhost:9200"]

processors:
  - decode_csv_fields:
      fields:
        message: decoded.csv
      separator: ","
      #ignore_missing: false
      #overwrite_keys: true
      #trim_leading_space: false
      #fail_on_error: true
  - extract_array:
      field: decoded.csv
      mappings:
       date3: 0
       date2: 1
       date: 2
       number1: 3
       action: 4
       message1: 5
       ssid: 6
       OS_client: 7
       IP_client: 8
       filename: 9
       full_path_name: 10
       empty: 11
       nas_full_path_file: 12
       extention: 13
       type: 14
       number2: 15
       nas_hostname: 16
       number3: 17
       vnas_hostname: 18
       number4: 19
       vnas_shared: 20
       number5: 21
       vnas_shared2: 22
       protocol: 23
       vers: 24
       OS_NAS: 25
       log_event: 26
       number6: 27
       number7: 28
       effective_time: 29
       SSID_host: 30
       ip1: 31
       ip2: 32
       empty2: 33
       username: 34
  - timestamp:
      field: date
      layouts:
        - '2006-01-02 15:04:05' <!--- IMPORTANT This is your time format
      ignore_failure: false
      tag_on_failure: ["_dateparsefailure"]       
  - drop_fields:
      fields: ["message","decoded.csv"]  

Resulting Document

{
  "_index": "filebeat-7.17.3-2023.05.14-000001",
  "_id": "lVWcG4gB_tT2Dm8CeSBz",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@timestamp": "2023-05-11T00:00:32.000Z",
    "date": "2023-05-11 00:00:32",
    "message1": "{\"additionalInformation\":{},\"blocked\":false,\"reason\":null,\"accessControlType\":null,\"userIgnored\":false,\"audited\":true}",
    "nas_hostname": "nasnode",
    "effective_time": "1683763200",
    "ip1": "172.16.0.1",
    "vnas_hostname": "hostname01",
    "vnas_shared2": "script$",
    "SSID_host": "S-1-5-21-xxxx-yyyyy-1714775081-2916883_ip1.ip2.ip3.ip4",
    "empty2": "",
    "username": "username",
    "date3": "1683763230.1440005",
    "number1": "144000583",
    "action": "READ",
    "number5": "535",
    "ip2": "10.1.1.1",
    "input": {
      "type": "filestream"
    },
    "decoded": {},
    "IP_client": "ip1.ip2.ip3.ip4",
    "number2": "1",
    "number3": "402",
    "log_event": "local.cs.fct.file-events.0",
    "ecs": {
      "version": "1.12.0"
    },
    "filename": "get_template.ps1",
    "full_path_name": "/SBM_scripts/PowershellScripts/vmware/get_template.ps1",
    "extention": "ps1",
    "protocol": "SMB",
    "number7": "2",
    "ssid": "S-1-5-21-xxxxx-yyyyyy-1714775081-2916883",
    "empty": "",
    "number4": "456",
    "number6": "5089",
    "type": "FILE",
    "vnas_shared": "Arg_SBM_scripts",
    "vers": "3.0",
    "OS_NAS": "ONTAP",
    "log": {
      "file": {
        "path": "/Users/sbrown/workspace/sample-data/discuss/discuss-timestamp.csv"
      },
      "offset": 1318
    },
    "host": {
      "name": "hyperion"
    },
    "agent": {
      "hostname": "hyperion",
      "ephemeral_id": "72968b1e-4a1f-4937-b81a-4d8695786fab",
      "id": "55a076f2-0ff2-4274-991f-e352276d5789",
      "name": "hyperion",
      "type": "filebeat",
      "version": "7.17.3"
    },
    "date2": "2023-05-11",
    "OS_client": "WINDOWS",
    "nas_full_path_file": "\\\\hostname01\\script$\\PowerShellScripts\\vmware\\get_template.ps1"
  }

BTW Depending on how many times you ran this your mapping could be in need over cleaning up.

Thanks a lot for your advices.

Now, I understand better my mistakes.

And I found my main trouble on the date field
In the middle of the the field "date" it's not a space but a tab.
So a update the layout with the correct symbol.
Now it works perfectly.

Thanks a lot for you quick help.

V.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.