Parsing http request in csv format


#1

Hi,

I have a project that requires me to download data from Yahoo flurry in csv format.
Yahoo provides the API and I have the token.

I can get the data using a REST client, but I am having a hard time with the logstash configuration.

I have two options:

  1. download as json. When I choose this option I get an array like this
    "rows" => [
    [0] { data},
    [1] {data}
    }

  2. download as csv. In the api I have th option to specify fsv format
    This should be simpler since all I need to do is write it to a file.

I am looking for help with option #2.

Here is my logstash conf file.

input {
  http_poller {
    urls => {
       test => {
        method => get
        url => "https://api-metrics.flurry.com/public/v1/data/....&format=csv"
        headers => {
           # Accept => "application/json"
	     "Authorization" => "Bearer api_key"
        }
        
      }
    }
    request_timeout => 60
    schedule => { every => "1h"}
    codec => "plain"
    metadata_target => "data"
  }
}

output {
  stdout { codec => rubydebug }
}

The message field looks like this:
"message" => "dateTime,app|name,language|name,activeDevices,averageTimePerSession\n\"2017-05-01 00:00:00.000-07:00\",\"xyz\",Unknown,24,0.0"

Thanks,
Frank


(Magnus Bäck) #2

Have you looked into the csv filter?


#3

Hi Magnus,

Thanks for the reply and suggestions. I was struggling with the csv filter yesterday.

I have made some progress but the output is not quite in a "true" csv format yet that can be imported to xls.

The output from the API is terminated with "\n" so I used the line codec to emit a row that is terminated with "\n".

Can you please take a look and let me know of any recommendations? Thank you!

Logstags config:

input {
  http_poller {
    urls => {
       test => {
        method => get
        url => "https://api-metrics.flurry.com/public/v1/data/...&format=csv"
        headers => {
           # Accept => "application/json"
	   "Authorization" => "Bearer api_key
        }        
      }
    }
    request_timeout => 60
    schedule => { every => "1h"}
    codec => "line"
  }
}

filter{
	csv {

	    columns => ["dateTime","app|name","language|name","activeDevices","averageTimePerSession"]
	    remove_field => ["message","@timestamp", "@version"]
        }

}
output {
 file {	path =>  "c:/some_path/mobile.csv"	}
}

The csv file looks like this:

header row: {"dateTime":"dateTime","app|name":"app|name","activeDevices":"activeDevices","averageTimePerSession":"averageTimePerSession","language|name":"language|name"}

Data:
{"dateTime":"2017-05-01 00:00:00.000-07:00","app|name":"name","activeDevices":"24","averageTimePerSession":"0.0","language|name":"Unknown"}

Regards,
Frank


#4

Hi,

I think I may have found a solution... when googling around how to download data as csv to a file I found a link to stack overflow with some work around suggestion. Shared link below.

Is there a way to get this to work without the workaround? Or, perhaps this is the correct way?

Just looking to learn so any advise would be very welcome.

Link:http://stackoverflow.com/questions/37007206/export-data-from-elasticsearch-to-csv-using-logstash

Updated output section of log stash:

output {
     file {codec => line { format => "%{dateTime}, %{app|name},%{language|name}, %{activeDevices}, %{averageTimePerSession}" }
    	path =>  "C:/some_path/mobile.csv"}
 }

Output:

dateTime, app|name, language|name, activeDevices, averageTimePerSession
2017-05-01 00:00:00.000-07:00, company1, Unknown, 24, 0.0
2017-05-01 00:00:00.000-07:00, company2, English, 56, 184.63806970509384

Regards,
Frank


(Magnus Bäck) #5

Is there a way to get this to work without the workaround?

As the GitHub issues says the correction is available in v3.0.3 of the csv output plugin.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.