Pb with format during a CSV import

Phildefer · January 16, 2023, 3:44pm

Hi all,
Sorry in advance for my bad english and my poor knowledge on ELK.

I need to import regularly a csv file like this :

data_id	iso	event_id_cnty	event_id_no_cnty	event_date	year	time_precision	event_type	sub_event_type	actor1
9722610	887	YEM78146	78146	6 January 2023	2023	1	Explosions/Remote violence	Remote explosive/landmine/IED	AQAP: Al Qaeda in the Arabian Peninsula
9722612	887	YEM78148	78148	6 January 2023	2023	1	Explosions/Remote violence	Remote explosive/landmine/IED	National Resistance Forces
9722613	887	YEM78149	78149	6 January 2023	2023	1	Battles	Armed clash	Military Forces of Yemen (2016-) Supreme Political Council
9722615	887	YEM78151	78151	6 January 2023	2023	1	Battles	Armed clash	Military Forces of Yemen (2016-) Supreme Political Council
....

My first problem is the format of the date field "event_date". As you can see, the date is like "06 january 2023" but when I want to import it as a date field with Kibana (csv upload), I have an error during the upload process. Kibana cannot parse this field when I put the date type instead of a Keyword type.

My second problem is that I would like to find a way to apply (when I find the solution for the date field) a template to this file each time I want to import it with Kibana. I don't know if (and how) it's possible to apply a template (with the right types) to the upload csv process without changing manually the type of the date field.

UPDATE : I have found the problem with the date format, so now I have got a ingest pipeline in Kibana which can transform correctly the event_date field, but I have a field (actor1) which is recognized as a text field and I would like to store it in my index as a keyword field. Is it possible to do that in the ingest pipeline ?

Thx for your help.

jsanz · January 18, 2023, 5:25pm

In the File Upload interface you can set up both the Ingest Pipeline AND the mappings for your new index so you can define the data types there

I took your table and added the following mappings and pipeline:

{
  "properties": {
    "actor1": {
      "type": "keyword"
    },
    "data_id": {
      "type": "long"
    },
    "event_date": {
      "type": "date"
    },
    "event_id_cnty": {
      "type": "keyword"
    },
    "event_id_no_cnty": {
      "type": "long"
    },
    "event_type": {
      "type": "keyword"
    },
    "iso": {
      "type": "long"
    },
    "sub_event_type": {
      "type": "keyword"
    },
    "time_precision": {
      "type": "long"
    },
    "year": {
      "type": "long"
    }
  }
}


{
  "description": "Ingest pipeline created by text structure finder",
  "processors": [
    {
      "csv": {
        "field": "message",
        "target_fields": [
          "data_id",
          "iso",
          "event_id_cnty",
          "event_id_no_cnty",
          "event_date",
          "year",
          "time_precision",
          "event_type",
          "sub_event_type",
          "actor1"
        ],
        "ignore_missing": false
      }
    },
    {
      "convert": {
        "field": "data_id",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "event_id_no_cnty",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "iso",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "time_precision",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "year",
        "type": "long",
        "ignore_missing": true
      }
    },{
      "date": {
        "field": "event_date",
        "formats": ["d MMMM yyyy"],
        "timezone" : "Europe/Amsterdam",
        "target_field": "event_date"
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

See the types for actor1 and event_date and the processor for the date.

And the discover screen after import:

Some notes about this (for you and whoever comes here in the future):

This is a nice tool to check for Java date format
You can simulate the pipeline to test things isolated. Kibana has also an interface to test things but maybe with DevTools you can test smaller things and then go to the UI for more advanced cases. Example I used to test the date parsing:

POST /_ingest/pipeline/_simulate
{
  "pipeline" :
  {
    "description": "_description",
    "processors": [
      {
      "date": {
        "field": "event_date",
        "formats": ["d MMMM yyyy"],
        "timezone" : "Europe/Amsterdam",
        "target_field": "event_date"
      }
    }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "event_date": "6 January 2023"
      }
    }
  ]
}

Finally, if you want to run this repeatedly maybe it is more convenient to you to store the pipeline, and put the mapping inside an index template and configure filebeat to read your CSVs.

Phildefer · January 19, 2023, 1:33am

Hi Jorge,

Thx for your answer.
I have created all files (Pipeline, template) and as you mentioned it, I need to find a way to apply automatically and repeatedly these files to my next uploads. If I understand well, it's not possible to automatically apply a pipeline and a template with the upload menu of Kibana so I'm going to try your solution with Filebeat.
Thx so much.

system · February 16, 2023, 1:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Date format problem with Kibana ingestion Kibana	2	197	June 26, 2023
Upload file / Pb formats Discussions en français	1	497	February 11, 2023
Kibana won't recognise a field as a date Kibana	2	312	July 31, 2019
Failed to Parse Date Field Logstash	3	5519	October 17, 2021
Importing csv met date format error Kibana	4	471	June 5, 2021

Pb with format during a CSV import

Related topics