Customize IndexWriter/Reader/etc within ElasticSearch


#1

Hello,

Is there a way for me to customize Lucene Index when using ElasticSearch? For example, I might replace text-based search with numerical/image-based search.


(Ryan Ernst) #2

Lucene is tightly coupled with Elasticsearch, and there is no support for plugging in an alternate index writer or reader. However, you can plug in additional query types. See the SearchPlugin interface. With a query, you can build any kind of matching against the underlying data that you desire.


#3

Thanks for your quick answer, Ryan!

I am new to ElasticSearch. Can you point me to how a SearchPlugin can be used to allow custom searching algorithm? Does it mean with this plugin, Lucene is no necessary for indexing and searching?


(Ryan Ernst) #4

A SearchPlugin allows adding query implementations. For example, when you use do the following search, a term query is run:

/_search
{
  "query": {
    "term": {
      "myfield": "someterm"
    }
  }
}

The term name is attached to a QueryParser (and also a Writable.Reader, but that is just implementation details for how the query object is passed across nodes). The QueryParser takes the json content, in this case { "myfield" : "someterm" }, and parses it into a QueryBuilder. The QueryBuilder is then to construct the actual Query object. Everything up until this is boiler plate for how to plug in a custom Lucene Query with a name elasticsearch will know how to parse in a search.

Implementing a Query is beyond the scope of a simple discuss response. There are numerous examples online, and you can ask questions on the lucene users mailing list if you need help. At a high level, a Query produces a tree used to return matching documents and score them. The implementation can do whatever it likes.

A word of caution, though, before embarking on this very advanced exercise: numeric queries are well supported in Elasticsearch and Lucene already and image based search is also possible. I have seen users break the image into features, and then index those features as text (unanalyzed tokens). You would do this translation into features both at index time, and then also at search time, looking to match as many features as possible to find the best match (this is where it normally gets complicated, in order to calculate a score which measures how well the features matched a given document's features).


#5

Thank you, I understand that this is no small undertaking for new users.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.