Recently I integrated Postgres data into Elasticsearch through Logstash via JDBC plugin. I have a graph DB (Apache Jena) which stores rdf data in a triplestore. Is there a plugin which can connect to this graph db and thereby export data to elasticserach?
Since I already have Logstash setup in our environment, I'm looking into Logstash plugins. If no plugins, can you direct how did you solve this problem?
Yes @danhermann, I tried and Logstash fails by sending an invalid sparql query over JDBC.
[2019-01-31T02:17:14,112][ERROR][logstash.inputs.jdbc ] Java::JavaSql::SQLException: Not a valid SPARQL query:
SELECT count(*) AS "COUNT" FROM (
select ?x ?y where {
?x a <http://abc.com/ontologies/Test/1#X> .
?y a <http://abc.com/ontologies/Test/1#Y> .
}
) AS "T1" LIMIT 1
[2019-01-31T02:17:14,119][WARN ][logstash.inputs.jdbc ] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::JavaSql::SQLException: Not a valid SPARQL query>}
[2019-01-31T02:17:14,119][WARN ][logstash.inputs.jdbc ] Attempt reconnection.
[2019-01-31T02:17:14,176][DEBUG][logstash.inputs.jdbc ] closing {:plugin=>"LogStash::Inputs::Jdbc"}
[2019-01-31T02:17:14,177][DEBUG][logstash.pipeline ] Input plugins stopped! Will shutdown filter/output workers.
As you can see, Logstash tries to get count of resultset by wrapping "SELECT count(*) AS "COUNT" FROM (" over a valid sparql query and making it an invalid one in the end.
This is a limitation of Logstash and JDBC input plugin, since on other sql clients like Squirrel I can perfectly use Jena JDBC driver to fetch sparql results. Is there a way I can get around it?
PS: This would have been correct and valid sparql query to get result count.
select (count(*) as ?count) {
select ?x ?y where {
?x a <http://abc.com/ontologies/Test/1#X> .
?y a <http://abc.com/ontologies/Test/1#Y> .
}
}
Thanks for referencing the issue. I'll watch over it.
If not through Logstash, any way to to import data into elasticsearch? Only way I'm thinking of exporting it as json or to rdbms and then ingesting, but seems not ideal especially for my case where for syncing to happen every 10 min window.
@vnktsh, if you can export to a file in JSON or to a DB, then it does open a range of possibilities for you. Logstash can definitely pull from either files or DBs. Depending on how the JSON export from Jena works, you might also be able to use Filebeat to send directly to Elasticsearch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.