Import aggregated data

Scott.Jensen · February 15, 2018, 7:55pm

Hi All,

I'm trying to import a csv that was exported out of Oracle. The way the CSV is exported has a "count" of similar items. I want to "de"aggregate the data so that in Elasticsearch I can do timebased analysis on the counts.

The CSV looks like:

 "DATE","ID","ACTION","COUNT"
 01/01/2018 03:10:00,2006221,"C",993
 01/01/2018 08:30:00,2038003,"C",1
 01/01/2018 11:30:00,2002720,"C",73839
 01/01/2018 15:10:00,8263,"C",394
 01/01/2018 23:10:00,12139,"C",53
 01/02/2018 00:10:00,2,"A",3080
 01/02/2018 22:20:00,2012601,"C",5
 01/03/2018 06:00:00,2002720,"C",76113
 01/03/2018 08:40:00,11948,"C",1

I would like the "Count" column to represent a document count in ES so when I use arithmetic functions (SUM/AVE) over a time-frame it will use the count field. I will also be graphing this information in Kibana.

Scott.Jensen · February 15, 2018, 7:59pm

Here is current logstash config file

input {
  file {
    path => "/home/logstash/test.csv"
    type => "audit"
  }
}


filter {
  csv {
      columns => ["Date","ID","ACTION","COUNT"]
  }
  date {
    match => ["Date", "M/d/YYYY HH:mm:ss"]
    target => "@timestamp"
  }
}

output {
    elasticsearch {
        action => "index"
        hosts => "localhost"
        index => "base-test"
        workers => 1
   }
}

lnx · February 16, 2018, 7:58pm

Having had a similar requirement in the past I couldn't come up with a solution via logstash or elasticsearch. You need to 'flatten' your data before you move it into elasticsearch via logstash.

So what I did was write a simple python script that read in the 'count' value and duplicated the line 'count' times into a new file. Of course you end up with massive input files but at least then you'll have e.g. 993 documents matching action C with ID 2006221 and the same timestamp value for date histograms in Kibana.

edit: Now that I think about it you can just use the Python API for elasticsearch and then read the count value, and simply write that line 'count' times into elasticsearch, negating the need to have huge input files. I don't know why I didn't think of that at the time.

Scott.Jensen · February 16, 2018, 9:17pm

I'll give that a try. Now to dust off my python skills.

lnx · February 16, 2018, 9:56pm

Shouldn't be too hard, this is very simple actually. I'd just give you my code but I have no idea where it is.

https://docs.python.org/3/library/csv.html
https://elasticsearch-py.readthedocs.io/en/master/

edit: You don't even need the elasticsearch python api either. You can import csv, import urllib2 and import json. Serialize your CSV to JSON and use urllib2 to POST to elasticsearch on 9200 'count' number of times.

system · March 16, 2018, 9:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Counting of the csv file and sending the counted data to elasticsearch using logstash Logstash	4	347	August 14, 2019
Logstash import CSV data classification Logstash	5	329	December 11, 2020
Import csv file to elasticsearch using logstash Logstash	13	2614	May 24, 2017
Aggregations in logstash Logstash	4	1119	July 6, 2017
Import CSV file to Logstash on windows Elasticsearch	2	2597	March 14, 2017

Import aggregated data

Related topics