I have a document to that includes a foreign key to an external data source. When a client retrieves the document, I'd like ES to transparently query the external source and materialize the data into the retrieved document just as if it was stored in ES.
I can't find some off-the-shelf functionality that does this except, possibly, rivers which are deprecated. I thought t get started by extending the FetchPhase by adding a new FetchSubFetch. Am I on the right track? Is there a more appropriate extension point for this type of activity?
Rivers were not active when retrieving documents, they were just SPOF helpers to connect to external sources and index documents.
Can you elaborate? In what kind of "transparent queries" you are interested in? Is there a protocol or a language you have in mind?
Because ES has a DSL you can't pass to other sources I guess you are talking about another query language for the federation, and a common result format that can wrap all documents in the result set. Probably JSON based, but most solutions are built on common fields. Plus, you have to "re-rank" result lists if you want to deliver relevant hits.
I am working on federating sources like SRU/SearchRetrieve/CQL and ES together (and ancient Z39.50 protocol based sources).
In this case, I have a proprietary data source that designed to store timeseries large volumes of time series data. The protocol to access the data is proprietary but there is a SQL/JDBC bridge that I can use if it helps.
Yes, that's right. I can easily represent the result sets from the proprietary data source as JSON.
I'm not very familiar with ES result semantics but I don't think the concept of hit relevancy would apply in my case. In my situation, I want to treat ES purely as an index or table-of-contents into the data from my proprietary source.
This sounds interesting. Do you have any public repos of your work? As I've indicated, I don't think I'm familiar enough with ES to participate much in the development but if you're looking for someone to test, I'd be interested.
I can easily represent the result sets from the proprietary data source as JSON.
Then it would be most effective to harvest your source and index all the data. Is this feasible? How much volume is it? Can you access the data incrementally?
I want to treat ES purely as an index or table-of-contents into the data from my proprietary source
That is not very effective, because the delivery of results will be so slow that ES is of no advantage at all.
In this project https://github.com/xbib/elasticsearch-webapp-libraryservice/blob/master/src/main/groovy/org/xbib/elasticsearch/webapp/extension/sru/SRUService.groovy you can see how I use ES with an XML layer for SRU/CQL. This allows to federate sources that understand CQL and SRU XML results. I also inject documents in the hit list with information from other ES indices by executing get actions on-the-fly. The consolidation of result fields from heterogeneous sources is not implemented.