How to map/index csv files with fscrawler?

ponten · June 25, 2017, 12:20pm

Dear forum,
am new here and have played around with Elasticsearch basics.

Using this great fscrawler to simplify ingestion of files into Elastic.

Issue: Using fscrawler, how to have the data from csv fields properly entered into the corresponding type fields that I created in Elastic?
Next phase would be to read it using Kibana which seems straight forward.

My mapping in Elastic exists now with 14 fields of type "text" and "integer", but all data from the csv ends up in the fscrawler default "content" field, so I realize that I'm missing this last little touch to get it working.
Below a part (spme of my 14 fields excl) of the mappings under docs index (content field in the end...)

Is there a simple solution to this by:

customizing the fscrawler related configuration/defaults?
or is there need for logstash? If so, could someone give simple guidelines to use together with fscrawler?

Thx for any assistance,
Johan

{
  "docs": {
    "mappings": {
      "my-type": {
        "properties": {
          "ACTUAL staff": {
            "type": "integer"
          },
          "AUTHORIZED staff": {
            "type": "integer"
          },
          "On leave": {
            "type": "integer"
          },
          "Others": {
            "type": "integer"
          },
          "Sick": {
            "type": "integer"
          },
          "Total on duty": {
            "type": "integer"
          },
          "Total unavailable": {
            "type": "integer"
          },
          "content": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256

[...]

dadoonet · June 26, 2017, 3:13pm

Hi

Thanks!

Well. FSCrawler has not been designed to process CSV files. While Tika can extract raw text out from it it will be flatten. Which means no structured data here.

I'd really recommend using Logstash instead. I wrote a tutorial a while ago about this:

HTH

ponten · June 27, 2017, 1:10pm

Hi,
yes I started to suspect that, so I already made a logstash solution for it which works in principle.
Can still use fscrawler for easy import of bulk data that I have.
Have some newbie issues with logstash, so will post another topic in case I won't solve them myself

Will have a look at your tutorial as well – thx for your feedback.

Here sharing my simple logstash config:

input {
  file {
    path => "c:/tmp/es/*.csv"
	start_position => "beginning"
	type => "staff"
	}
}
filter {
	csv {
		separator => ","
		columns => [ "Unit", "Male staff", "Female staff", "Total on duty", "Total unavailable", "On leave", "Others" ]
	}
}

output {
  elasticsearch {
	action => "index"
	hosts => ["localhost:9200"]
	index => "docs"
  }
  stdout {}
}

system · July 25, 2017, 1:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Regex search on the content field - fscrawler Kibana	4	592	August 31, 2020
Fscrawler - change the index mapping，reduce redundant field or object Elasticsearch	5	224	April 20, 2023
FSCrawler Question Elasticsearch	7	3083	March 17, 2017
Filebeat+csv - to many fields indexed Beats filebeat	4	432	July 15, 2019
Visualizing the count of words in each document(pdf, word) in kibana using FSCRAWLER Kibana	4	1063	February 21, 2018

How to map/index csv files with fscrawler?

Related topics