Parsing http request in csv format

Hi,

I have a project that requires me to download data from Yahoo flurry in csv format.
Yahoo provides the API and I have the token.

I can get the data using a REST client, but I am having a hard time with the logstash configuration.

I have two options:

  1. download as json. When I choose this option I get an array like this
    "rows" => [
    [0] { data},
    [1] {data}
    }

  2. download as csv. In the api I have th option to specify fsv format
    This should be simpler since all I need to do is write it to a file.

I am looking for help with option #2.

Here is my logstash conf file.

input {
  http_poller {
    urls => {
       test => {
        method => get
        url => "https://api-metrics.flurry.com/public/v1/data/....&format=csv"
        headers => {
           # Accept => "application/json"
	     "Authorization" => "Bearer api_key"
        }
        
      }
    }
    request_timeout => 60
    schedule => { every => "1h"}
    codec => "plain"
    metadata_target => "data"
  }
}

output {
  stdout { codec => rubydebug }
}

The message field looks like this:
"message" => "dateTime,app|name,language|name,activeDevices,averageTimePerSession\n\"2017-05-01 00:00:00.000-07:00\",\"xyz\",Unknown,24,0.0"

Thanks,
Frank

Have you looked into the csv filter?

Hi Magnus,

Thanks for the reply and suggestions. I was struggling with the csv filter yesterday.

I have made some progress but the output is not quite in a "true" csv format yet that can be imported to xls.

The output from the API is terminated with "\n" so I used the line codec to emit a row that is terminated with "\n".

Can you please take a look and let me know of any recommendations? Thank you!

Logstags config:

input {
  http_poller {
    urls => {
       test => {
        method => get
        url => "https://api-metrics.flurry.com/public/v1/data/...&format=csv"
        headers => {
           # Accept => "application/json"
	   "Authorization" => "Bearer api_key
        }        
      }
    }
    request_timeout => 60
    schedule => { every => "1h"}
    codec => "line"
  }
}

filter{
	csv {

	    columns => ["dateTime","app|name","language|name","activeDevices","averageTimePerSession"]
	    remove_field => ["message","@timestamp", "@version"]
        }

}
output {
 file {	path =>  "c:/some_path/mobile.csv"	}
}

The csv file looks like this:

header row: {"dateTime":"dateTime","app|name":"app|name","activeDevices":"activeDevices","averageTimePerSession":"averageTimePerSession","language|name":"language|name"}

Data:
{"dateTime":"2017-05-01 00:00:00.000-07:00","app|name":"name","activeDevices":"24","averageTimePerSession":"0.0","language|name":"Unknown"}

Regards,
Frank

Hi,

I think I may have found a solution... when googling around how to download data as csv to a file I found a link to stack overflow with some work around suggestion. Shared link below.

Is there a way to get this to work without the workaround? Or, perhaps this is the correct way?

Just looking to learn so any advise would be very welcome.

Link:http://stackoverflow.com/questions/37007206/export-data-from-elasticsearch-to-csv-using-logstash

Updated output section of log stash:

output {
     file {codec => line { format => "%{dateTime}, %{app|name},%{language|name}, %{activeDevices}, %{averageTimePerSession}" }
    	path =>  "C:/some_path/mobile.csv"}
 }

Output:

dateTime, app|name, language|name, activeDevices, averageTimePerSession
2017-05-01 00:00:00.000-07:00, company1, Unknown, 24, 0.0
2017-05-01 00:00:00.000-07:00, company2, English, 56, 184.63806970509384

Regards,
Frank

Is there a way to get this to work without the workaround?

As the GitHub issues says the correction is available in v3.0.3 of the csv output plugin.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.