How to map/index csv files with fscrawler?

Dear forum,
am new here and have played around with Elasticsearch basics.

Using this great fscrawler to simplify ingestion of files into Elastic.

Issue: Using fscrawler, how to have the data from csv fields properly entered into the corresponding type fields that I created in Elastic?
Next phase would be to read it using Kibana which seems straight forward.

My mapping in Elastic exists now with 14 fields of type "text" and "integer", but all data from the csv ends up in the fscrawler default "content" field, so I realize that I'm missing this last little touch to get it working.
Below a part (spme of my 14 fields excl) of the mappings under docs index (content field in the end...)

Is there a simple solution to this by:

  • customizing the fscrawler related configuration/defaults?
  • or is there need for logstash? If so, could someone give simple guidelines to use together with fscrawler?

Thx for any assistance,
Johan

{
  "docs": {
    "mappings": {
      "my-type": {
        "properties": {
          "ACTUAL staff": {
            "type": "integer"
          },
          "AUTHORIZED staff": {
            "type": "integer"
          },
          "On leave": {
            "type": "integer"
          },
          "Others": {
            "type": "integer"
          },
          "Sick": {
            "type": "integer"
          },
          "Total on duty": {
            "type": "integer"
          },
          "Total unavailable": {
            "type": "integer"
          },
          "content": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256

[...]

Hi

Thanks!

Well. FSCrawler has not been designed to process CSV files. While Tika can extract raw text out from it it will be flatten. Which means no structured data here.

I'd really recommend using Logstash instead. I wrote a tutorial a while ago about this:

http://david.pilato.fr/blog/2015/04/28/exploring-capitaine-train-dataset/

HTH

Hi,
yes I started to suspect that, so I already made a logstash solution for it which works in principle.
Can still use fscrawler for easy import of bulk data that I have.
Have some newbie issues with logstash, so will post another topic in case I won't solve them myself :wink:

Will have a look at your tutorial as well – thx for your feedback.

Here sharing my simple logstash config:

input {
  file {
    path => "c:/tmp/es/*.csv"
	start_position => "beginning"
	type => "staff"
	}
}
filter {
	csv {
		separator => ","
		columns => [ "Unit", "Male staff", "Female staff", "Total on duty", "Total unavailable", "On leave", "Others" ]
	}
}

output {
  elasticsearch {
	action => "index"
	hosts => ["localhost:9200"]
	index => "docs"
  }
  stdout {}
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.