Loading and indexing dbpedia datasets in ES

Hello,
I'm new to Elasticsearch and I am trying to load and index dbpedia datasets (RDF triples) into ES. The datasets are available in ttl format from https://wiki.dbpedia.org/downloads-2016-04.

My question is how do I load and index this into ES? Should I first convert the datato json format?

Thanks for your help.

See https://www.youtube.com/watch?v=ZzWT-2xdaek
The comments section includes a link to some code

Hi, thanks for the reply.
I've already seen that tutorial but I miss the passage of the loading of the dataset in ES. When I try to run the python script, I get a connection error, although elasticsearch is running in the cloud. When elasticsearch is running locally instead, the python script is executed successfully, but on cmd I have a java.io.IOException and I see this error "[o.e.h.n.Netty4HttpServerTransport] [my_node] caught exception while handling client http traffic, closing connection"

You need to setup the connection details correctly.
This is an example of a python client connecting to an elastic cloud cluster:

import certifi
from elasticsearch.client import Elasticsearch

remoteEs = Elasticsearch(
		["xxxxxMY_CLOUD_ENDPOINT xxxxxx.found.io"],
		port=9243,
		http_auth="MY_USERNAME:MY_PASSWORD",
		use_ssl=True,
		verify_certs=True,
		ca_certs=certifi.where()
	)

response = remoteEs.search(index="MY_INDEX", body = myQuery)

Thanks again.
I solved it: the python script is executed successfully, and I have verified that the index is created into ES successfully.
However, I have one last problem: the index is empty.
I noticed that this depends on the fact that the script never enters the final loop, which should iterate over all the triples in the dataset (for line in file:, where file is obtained by with os.popen('bzip2 -cd ' + filename) as file:).

Also, I noticed that the import of bz2 is unused. Can the above error depend on this?

To be honest the code was probably originally written by searching stackoverflow for “how to read a bz2 file using python”. This probably isn’t the forum to discuss your problems with reading the raw data but I’d start by checking the file name is right.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.