A logstash input plugin to ingest RDF/graph data into elasticsearch?

vnktsh · January 30, 2019, 12:58am

Hello guys,

Recently I integrated Postgres data into Elasticsearch through Logstash via JDBC plugin. I have a graph DB (Apache Jena) which stores rdf data in a triplestore. Is there a plugin which can connect to this graph db and thereby export data to elasticserach?

So far I came across:
elasticsearch-plugin-rdf-jena by jparte [No longer supported for ES > v1.4.0]

eea.elasticsearch.river.rdf
Based on rivers - deprecated. Workaround is to host it in separate app.

Since I already have Logstash setup in our environment, I'm looking into Logstash plugins. If no plugins, can you direct how did you solve this problem?

danhermann · January 30, 2019, 11:04am

@vnktsh, have you considered using the Logstash JDBC input with the Jena JDBC driver?

vnktsh · January 31, 2019, 3:02am

Yes @danhermann, I tried and Logstash fails by sending an invalid sparql query over JDBC.

[2019-01-31T02:17:14,112][ERROR][logstash.inputs.jdbc     ] Java::JavaSql::SQLException: Not a valid SPARQL query: 
        SELECT count(*) AS "COUNT" FROM (
			select ?x ?y where {
                           ?x a <http://abc.com/ontologies/Test/1#X> .
                           ?y a <http://abc.com/ontologies/Test/1#Y> .
			}
		) AS "T1" LIMIT 1
[2019-01-31T02:17:14,119][WARN ][logstash.inputs.jdbc     ] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::JavaSql::SQLException: Not a valid SPARQL query>}
[2019-01-31T02:17:14,119][WARN ][logstash.inputs.jdbc     ] Attempt reconnection.
[2019-01-31T02:17:14,176][DEBUG][logstash.inputs.jdbc     ] closing {:plugin=>"LogStash::Inputs::Jdbc"}
[2019-01-31T02:17:14,177][DEBUG][logstash.pipeline        ] Input plugins stopped! Will shutdown filter/output workers.

As you can see, Logstash tries to get count of resultset by wrapping "SELECT count(*) AS "COUNT" FROM (" over a valid sparql query and making it an invalid one in the end.

This is a limitation of Logstash and JDBC input plugin, since on other sql clients like Squirrel I can perfectly use Jena JDBC driver to fetch sparql results. Is there a way I can get around it?

PS: This would have been correct and valid sparql query to get result count.

select (count(*) as ?count) {
    			select ?x ?y where {
                               ?x a <http://abc.com/ontologies/Test/1#X> .
                               ?y a <http://abc.com/ontologies/Test/1#Y> .
    			}
}

danhermann · January 31, 2019, 12:19pm

Unfortunately, there is currently no way to disable the attempt to obtain a record count in the JDBC input though there is an issue open about that:

vnktsh · January 31, 2019, 6:23pm

Thanks for referencing the issue. I'll watch over it.

If not through Logstash, any way to to import data into elasticsearch? Only way I'm thinking of exporting it as json or to rdbms and then ingesting, but seems not ideal especially for my case where for syncing to happen every 10 min window.

danhermann · February 1, 2019, 2:04am

@vnktsh, if you can export to a file in JSON or to a DB, then it does open a range of possibilities for you. Logstash can definitely pull from either files or DBs. Depending on how the JSON export from Jena works, you might also be able to use Filebeat to send directly to Elasticsearch.

system · March 1, 2019, 2:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.