Importing csv with geo locations using logstash 5 / kibana 5


(Marc Weber) #1

Hello,

Can someone help me with the configuration of a logstash 5 conf file and/or kibana 5?
I'm running logstash, elasticsearch and kibana locally on a mac 10.10.x.
The topic is about the combination of loading csv files in combination with geo locations
(latitude, longitude).

UseCase

  • import of a given csv file using logstash
  • the csv is structured lines/columns and includes latitude/longitude geo location
  • running the below pasted conf-file
  • exploring the geo-data in a kibana 5 tile-map

Results

  • data are piping into elasticsearch and showing up in kibana
  • the lat/lon are shown as GeoHASH in the tile-map configuration section
    in the subsequent dropdown a geoip.longitude is appearing
  • once picking up the geoip kibana prompts "no results found"

Question

  • could you please double-check my conf file example
  • do I need to re-config my logstash load file
  • do I need to configure/set in kibana to get my "location" work

Appreciate your help

############################

my_geo_config
input {
stdin {
type => "stdin-type"
}
file {
path => ["/Users/.../Test_020.csv"]
start_position => "beginning"
}
}

filter {
csv {
columns => ["field_A","field_B","field_C_date","field_D","latitude","longitude"]
separator => ","
}
date {
match => [ "field_C_date", "YYYY-MM-DD" ]
}

		mutate {
				convert => ["field_A", "string"]
				convert => ["field_B", "integer"]
				convert => ["field_D", "integer"]
				
				}			
			
		mutate {
				convert => {"latitude" => "float"}
				convert => {"longitude" => "float"}
				add_field => ["location", "%{latitude},%{longitude}"]
				convert => {"location" => "float"}
			   }

	}

output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}


(Mark Walkom) #2

It looks ok, do you have the right time range in KB?


(Marc Weber) #3

Dear Mark,

Thanks for your reply. Could you please provide me with an hint regarding the time range?

Regards Marc


(Mark Walkom) #4

Top right - https://www.elastic.co/guide/en/kibana/current/set-time-filter.html


(Marc Weber) #5

Dear Mark (with "k"),

Thanks a million.

I'm afraid my data conversion is not very properly. I'm trying to figure out that particular item.

Never the less I came across another issue. As for configuring the *.config file I'm dealing with
an input csv of about 100 rows.

Now when scaling the file to the full size about 500.000 rows the following happens:

  • logstash is processing until "Successfully started Logstash API endpoint {:port=>9600}"
    then nothing else happens
  • a few lines above logstash prompts " Starting pipeline {"id"=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>1000}"
  • if I exclude the latitude, longitude columns then the file is processed (or sometimes not)
    Never had that kind of experience with stack #4

Also matching data seems to be tricky. Just want to index the "voucher_date" (YYYY-MM-DD in the source). So far it doesn't work with
date {
match => [ "voucher_date", "YYYY MM dd" ]
}

Have you or anyone an idea how to design a *config unsing csv, geo(latitude, longitude) voucher_date string to date and having more than 100 rows in the source-file?

Greatly appreciate your contributions

Gruß Marc


(Mark Walkom) #6

Sounds like a sincedb issue more than anything.


(Marc Weber) #7

Dear Mark,

Thanks for keeping up with the above. It took a little time to figure it out.
Finally I converted the *csv from UTF8 "Legacy Mac OS (CR)" into "Unix (LF)" it started working.

Thanks again and have a great one

Marc


(Mark Walkom) #8

Ohhh, that is odd.
Can you raise something against the GH repo for the plugin? We should really handle this for the user :slight_smile:


(Marc Weber) #9

Dear Mark,

Yep, GH = GitHub, isn’t it? May need a little time, as I’m a bit busy at the moment.

Now I’m struggling again with the GeoData. They don’t appearing as geo_point/geohash in kibana.
Referring to the import issue I’m afraid I’m somehow lost. I want to review and adjust my complete
workflow end-2-end.
Is there a paper/blog/website available which provides an “ES 5 stack csv-geo” breakdown in principal
or from bird’s eye perspective? Based on an example or something like this? I’m looking for a kind
of an recipe or a handy checklist?

Appreciate your help

Marc


(Mark Walkom) #10

Yep! :slight_smile:

What do they show as?
Check indexname/_mapping to see what the field is defined as in ES.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.