I am looking for a best practice case to automate the indexing process of RDF files (ontologies) into Elasticsearch engine.
Here is a scenario,
- A team prepares (responsible for) generating and maintaining RDF (turtle) files using topbraid edg composer.
- They give us the turtle (.ttl) files.
- We using Apache Jena libraries consolidate all turtle (RDF triples) files into one. Apache Jena distribution comes with a set of command line tools (turtle is one), which is used for consolidation.
- Afterwords, we are using Apache Jena libraries (Turtle importer) to run SPARKL queries onto the consolidated turtle file. The Importer is using the Jena Java libraries to do the TTL parsing and running the SPARQL query to identify the documents to be imported. The Importer is using the Jena Java libraries to do the TTL parsing and running the SPARQL query to identify the documents to be imported.
- In the end, the searched documents are prepared (in memory JSON format) and consumed by elasticsearch Java libraries for indexing.
All these above steps are happening right now manually and the requirement is to automate them.
I have a couple of approaches to do so, but just wanted to check if there is any standard/best practice for such kind of task? Is there any out of the box elasticsearch/jena APIs for such kind of automation? Is there any best practice for the Topbraid edg composer not to generate any physical .ttl files, rather offer something out of the box for publishing RDF triples? Or should we offer an endpoint, which can be notified by the RDF team when there is a new file to consume? The best practices to handle updates etc?Any other idea / approach?