Hi,
I've only just done a course on Elasticsearch, Kibana and Logstash. I have a Ubuntu 64-bit in VirtualBox with Elasticsearch 6.2.3 and Logstash And I thought it would be a good idea to do a little project to see what I can do with the ELK stack now. So I decided to load the newly released second dataset of the Gaia telescope (https://gea.esac.esa.int/archive/).
To start off, I just used one 40 MB csv file with 14,000+ stars (http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source/csv/GaiaSource_1000172165251650944_1000424567594791808.csv.gz) instead of the 1.6 billion available. Just to see how things go. And I created this conf file:
input {
file {
path => ["/home/marcel-jan/gaia/GaiaSource_1000172165251650944_1000424567594791808.csv"]
start_position => "beginning"
sincedb_path => "/null"
type => "data"
}
}
filter {
csv {
separator => ","
columns => [
"solution_id",
"designation",
"source_id",
"random_index",
<etc...>
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "gaiadr2"
}
}
And this works. So hurray. But unfortunately all the columns are of the string type and that's not so useful. So I added a convert clause in the filter:
filter {
csv {
separator => ","
columns => [
"solution_id",
"designation",
"source_id",
"random_index",
<etc...>
]
}
mutate {
convert => {
"solution_id" => "integer"
"designation" => "integer"
"source_id" => "integer"
"ref_epoch" => "float"
"ra" => "float"
"ra_error" => "float"
<etc..>
}
}
}
Now this works when I try only the integer ones. Some of the data are, according to the documentation (https://gea.esac.esa.int/archive/documentation/GDR2/Gaia_archive/chap_datamodel/sec_dm_main_tables/ssec_dm_gaia_source.html) of the dataset of the double type. But I have already found out Logstash doesn't support the double data type. I've tried float instead. (Here are some examples of the double data: 103.4475289523685, 0.04109941963375859, 56.02202543042615. It seems it should fit in a float.)
And when I do that, Logstash simply hangs silently, not importing any data.
Where am I going wrong? Is it the data type? Is there a way around this?