Indexing RDF datasets

Hi all, I am new to ElasticSearch and currently I am doing research and implementing a concept for a small project. I would like to index the DBpedia RDF datasets using ElasticSearch. The RDF datasets will be stored in Apache Fuseki and I would like to stream these datasets into ElasticSearch for indexing. I found the following possibilities:

  1. https://github.com/elastic/elasticsearch-river-wikipedia
    Rivers Deprecated

  2. https://github.com/eea/eea.elasticsearch.river.rdf
    Rivers Deprecated.

  3. https://github.com/elastic/stream2es
    Suggests to use Logstash, although there already seems to be functionality to stream Wikipedia datasets into Elasticsearch.

  4. Logstash
    Regarding Logstash, I am a bit lost since from my understanding Logstash gives you the facility to stream logs into Elasticsearch.

On which option I should concentrate my efforts? Are there any alternatives? It seems that there is no ready made solution to index RDF datasets.

Logstash probably.

I dunno about Elaticsearch for RDF in general because it can't arbitrarily join and RDF is all joins. You can use Elasticsearch for the full text querying though.

Elasticsearch might be a wrong choice here. I will suggest look into Marklogic. It should solve your requirement.

I indexed the DBpedia link structure in elasticsearch and explored it using the Graph UI which can be used to give priority to significant links in the data (significant != popular). There's a video demo here [1] and if it looks like it is of interest I can share how this demo was put together with you.

[1] See 32 minutes in to https://www.elastic.co/elasticon/conf/2016/sf/graph-capabilities-in-the-elastic-stack

That is great Mark! This is exactly what I need.
Yes please, Mark I want to know how the demo was put together.

The Graph UI is incredible; I tested it using the Shakespeare dataset and the experience was just awesome. For sure it will be awesome using the Graph UI on the DBpedia datasets.

Check this gist [1] for a python script to load dbpedia data [2] into 5.3+ elasticsearch.

Each elasticsearch doc is a single wikipedia article with an array of the other articles it links to.
Using the Graph api/UI in x-pack [3] you can explore strongly-associated subjects (those subjects that are found to be commonly paired together in articles' linked_subjects field).

Cheers,
Mark

[1] https://gist.github.com/markharwood/21c723039425b4b3e4277b2bffa5c54c
[2] http://downloads.dbpedia.org/3.6/en/page_links_en.nt.bz2
[3] https://www.elastic.co/downloads/x-pack

Demo using 5.4 https://youtu.be/ZzWT-2xdaek

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.