Hello,
I'm new to Elasticsearch and I am trying to load and index dbpedia datasets (RDF triples) into ES. The datasets are available in ttl format from https://wiki.dbpedia.org/downloads-2016-04.
My question is how do I load and index this into ES? Should I first convert the datato json format?
Hi, thanks for the reply.
I've already seen that tutorial but I miss the passage of the loading of the dataset in ES. When I try to run the python script, I get a connection error, although elasticsearch is running in the cloud. When elasticsearch is running locally instead, the python script is executed successfully, but on cmd I have a java.io.IOException and I see this error "[o.e.h.n.Netty4HttpServerTransport] [my_node] caught exception while handling client http traffic, closing connection"
Thanks again.
I solved it: the python script is executed successfully, and I have verified that the index is created into ES successfully.
However, I have one last problem: the index is empty.
I noticed that this depends on the fact that the script never enters the final loop, which should iterate over all the triples in the dataset (for line in file:, where file is obtained by with os.popen('bzip2 -cd ' + filename) as file:).
Also, I noticed that the import of bz2 is unused. Can the above error depend on this?
To be honest the code was probably originally written by searching stackoverflow for “how to read a bz2 file using python”. This probably isn’t the forum to discuss your problems with reading the raw data but I’d start by checking the file name is right.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.