Using avro codec to convert from JSON to avro


(Guido) #1

Hey all,

I was trying to convert the file format from json to avro.
I was able to do that by using the following configuration.

json sample:

{"id":1,"first_name":"Dunn","last_name":"Fleote","email":"dfleote0@apache.org","gender":"Male","ip_address":"39.171.224.159"}
{"id":2,"first_name":"Toddie","last_name":"Axton","email":"taxton1@yellowbook.com","gender":"Male","ip_address":"144.240.119.8"}
{"id":3,"first_name":"Giselbert","last_name":"McArd","email":"gmcard2@nifty.com","gender":"Male","ip_address":"241.126.219.105"}
{"id":4,"first_name":"Virge","last_name":"Rewan","email":"vrewan3@economist.com","gender":"Male","ip_address":"250.58.141.113"}
{"id":5,"first_name":"Zebedee","last_name":"Orta","email":"zorta4@uiuc.edu","gender":"Male","ip_address":"237.104.18.42"}
{"id":6,"first_name":"Cecil","last_name":"Scherer","email":"cscherer5@networkadvertising.org","gender":"Male","ip_address":"233.233.72.205"}
{"id":7,"first_name":"Gregory","last_name":"Wittrington","email":"gwittrington6@barnesandnoble.com","gender":"Male","ip_address":"141.94.150.153"}
{"id":8,"first_name":"Friedrick","last_name":"Mogford","email":"fmogford7@gizmodo.com","gender":"Male","ip_address":"160.169.182.3"}
{"id":9,"first_name":"Israel","last_name":"Gidney","email":"igidney8@walmart.com","gender":"Male","ip_address":"206.118.233.35"}
{"id":10,"first_name":"Elladine","last_name":"Treffry","email":"etreffry9@ucla.edu","gender":"Female","ip_address":"133.32.179.105"}

test.avsc file:

{
"namespace": "sample",
"type": "record",
"name": "event",
"fields": [
{"name": "id", "type": "int"},
{"name": "first_name", "type": "string"},
{"name": "last_name", "type": "string"},
{"name": "email", "type": "string"},
{"name": "gender", "type": "string"},
{"name": "ip_address", "type": "string"}
]
}

Logstash configuration:

input {
file {
path => "/somewhere/test.json"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => "json"
}
}
output {
file {
path => "/somewhere/test.avro"
codec => avro {
schema_uri => "/somwhere/test.avsc"
}
}
}

All these seemed to be working fine because I can see the test.avro file as the result, the problem is that I'm not able to get the schema, I got an IOException related to not a data file.

java -jar ~/Downloads/avro/jar_files/avro-tools-1.8.2.jar getschema out/test.avro
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileReader.(DataFileReader.java:97)
at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
at org.apache.avro.tool.Main.run(Main.java:87)
at org.apache.avro.tool.Main.main(Main.java:76)

Am I missing something?
Btw, is it possible to manage the size of the output files?

Best,

Guido.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.