Indexing RDF datasets

AZammit · May 2, 2017, 2:07pm

Hi all, I am new to ElasticSearch and currently I am doing research and implementing a concept for a small project. I would like to index the DBpedia RDF datasets using ElasticSearch. The RDF datasets will be stored in Apache Fuseki and I would like to stream these datasets into ElasticSearch for indexing. I found the following possibilities:

https://github.com/elastic/elasticsearch-river-wikipedia
Rivers Deprecated
https://github.com/eea/eea.elasticsearch.river.rdf
Rivers Deprecated.
https://github.com/elastic/stream2es
Suggests to use Logstash, although there already seems to be functionality to stream Wikipedia datasets into Elasticsearch.
Logstash
Regarding Logstash, I am a bit lost since from my understanding Logstash gives you the facility to stream logs into Elasticsearch.

On which option I should concentrate my efforts? Are there any alternatives? It seems that there is no ready made solution to index RDF datasets.

nik9000 · May 2, 2017, 3:41pm

Logstash probably.

I dunno about Elaticsearch for RDF in general because it can't arbitrarily join and RDF is all joins. You can use Elasticsearch for the full text querying though.

Nilabhsagar · May 2, 2017, 5:07pm

Elasticsearch might be a wrong choice here. I will suggest look into Marklogic. It should solve your requirement.

Mark_Harwood · May 2, 2017, 5:23pm

I indexed the DBpedia link structure in elasticsearch and explored it using the Graph UI which can be used to give priority to significant links in the data (significant != popular). There's a video demo here [1] and if it looks like it is of interest I can share how this demo was put together with you.

[1] See 32 minutes in to https://www.elastic.co/elasticon/conf/2016/sf/graph-capabilities-in-the-elastic-stack

AZammit · May 2, 2017, 5:55pm

That is great Mark! This is exactly what I need.
Yes please, Mark I want to know how the demo was put together.

The Graph UI is incredible; I tested it using the Shakespeare dataset and the experience was just awesome. For sure it will be awesome using the Graph UI on the DBpedia datasets.

Mark_Harwood · May 3, 2017, 9:08am

Check this gist [1] for a python script to load dbpedia data [2] into 5.3+ elasticsearch.

Each elasticsearch doc is a single wikipedia article with an array of the other articles it links to.
Using the Graph api/UI in x-pack [3] you can explore strongly-associated subjects (those subjects that are found to be commonly paired together in articles' linked_subjects field).

Cheers,
Mark

[1] https://gist.github.com/markharwood/21c723039425b4b3e4277b2bffa5c54c
[2] http://downloads.dbpedia.org/3.6/en/page_links_en.nt.bz2
[3] https://www.elastic.co/downloads/x-pack

Mark_Harwood · May 8, 2017, 11:47am

Demo using 5.4 https://youtu.be/ZzWT-2xdaek

system · June 5, 2017, 11:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Rdf data Elasticsearch	5	2002	July 6, 2017
Indexing MySQL data with ElasticSearch Elasticsearch	2	961	June 14, 2019
ElasticSearch Indexing question Elasticsearch	22	3760	July 5, 2017
Pull data from wikipedia Logstash	2	795	July 6, 2017
Production Ready ES Elasticsearch	5	930	March 26, 2016

Indexing RDF datasets

Related topics