Evaluating Elasticsearch for document classification with keywords


I’m new to elasticsearch and evaluating it for use in my project where I need to compare a large set of keywords (currently around 3500) against documents (on average around 500 words long) and highlight any matches that occur.

The keywords I have are mostly specialised technical terms and the documents are unstructured natural language text.

Basically, I want to end up with the document text, modified with highlighted keyword matches, plus a separate array of the matched keywords that I can display as a simple list. I think the best way to describe what I’m trying to achieve is a document classification system.

Things I’d be interested to know are:

  • Is Elasticsearch a suitable tool for this kind of use case?
  • If it is, are there any specific features or configurations I’ll need to look into?
  • What kind of performance I can expect - eg, can I do this sort of thing on demand or will it require a background task?

I realise these questions are quite broad but I’d really appreciate any pointers.


You can use Percolator for this!