How we built Elasticsearch plugins for indexing ontologies

Charlie_Hull · January 27, 2016, 10:59am

Hello! My colleague Matt has just written about a recent project where he's been building ES plugins for ontology indexing. Hope you find it interesting and the plugins useful - and we welcome feedback, especially hints as to where we've gone wrong and/or better ways to do it.
http://www.flax.co.uk/blog/2016/01/27/fun-frustration-writing-plugin-elasticsearch-ontology-indexing/

This is part of a publically-funded project here in the UK to develop better open source search software for bioinformaticians - it's called BioSolr for historical reasons but we're also using ES. We'll be talking about this at a workshop event near Cambridge next week.

jprante · January 27, 2016, 12:51pm

Nice job!

Implementing custom field mappers is definitely one of the most advanced way of all possible things a plugin can do (aside from custom aggregation methods).

Why not just set up a Lucene analyzer or a token filter for ontology matching? I did something similar when porting the lucene-skos project to ES

I think the comparison to Solr's UpdateRequestProcessor is quite unfair. It's a different approach.

If it's all about lack of documentation, why not just describe the implementation process in a dedicated blog post ? Or open pull requests to add developer notes to the ES doc site? I bet the ES team is always happy about such contributions. Or submitting a new user story. I wish I could document more, too, but I'm very lousy at this job, and can't find enough time.

Personally I am convinced the (well-thought and elaborative) open source code of Elasticsearch is the best documentation possible but that's not what is of great comfort for most beginners.

dadoonet · January 27, 2016, 1:20pm

++ You won your bet Jörg! We are more than happy when users contribute code, doc, tests, stories, whatever...

Ivan · January 28, 2016, 12:12am

One of the reasons why I have never documented the Elasticsearch internals
(for example the TransportAction families) is that it is a moving target
for which you have no insight about the direction it is traveling in. Every
time I dig deep in the code and want to change something via a plugin, I
learn that such code might go away.

Cheers,

Ivan

Matt_Pearce · January 28, 2016, 9:48am

Thanks for the suggestions.

I'm not sure that writing an analyzer or token filter would help for our purposes. The object of the plugin is to retrieve ontology data and add it to the document, so it's not really a question of tokenising, unless I'm missing your point (entirely possible!).

It is a completely different approach to the UpdateRequestProcessor, which I did make clear in the article. There's no real equivalent in ElasticSearch though, so creating a new field mapping seemed like a good approach.

Regarding documentation, as mentioned below, it seems like I'd be documenting a moving target, which would potentially end up with something more misleading than digging through other people's plugins to see how they approached the problem. I might look at a more detailed blog post in the future, though.

Topic		Replies	Views
[ANNOUNCEMENT] New analysis plugin : dandelion Elasticsearch	2	432	May 31, 2018
Documentation for plugin authoring? Elasticsearch	6	325	July 6, 2017
[ANNOUNCEMENT] new analysis plugin released : dandelion Community Ecosystem	1	1500	June 23, 2020
Any hope for better documentation on ElasticSearch internals and plugin AIPs in the future? Elasticsearch	2	1047	July 5, 2017
Documentation about elasticsearch internals Elasticsearch	4	244	July 6, 2017

How we built Elasticsearch plugins for indexing ontologies

Related topics