I am interested in contributing to Elastic Search and GSOC provides a nice platform to start. I am somewhat late in pursuing this as I have been fiddling with a variety of products. I hope that it is not already too late. I have some experience with SOLR and Lucene along with a primarily Java/Scala background so onboarding shouldn't be too hard hopefully. My current experience with Elastic is still somewhat elementary. I have used it as an inverted index but never really gone into the depth of its features.
I have a few questions about the project that I'd like to list down:
- What exactly does the document mean by "more efficient when in a different order". An example could perhaps make this clearer why it is faster. For instance, I was looking through the
PlainHighlighterand from my understanding, it initialized a Lucene
Highlighterand then calls
getBestTextFragmentson this. I did not understand why the order would matter here. Maybe this is true for a different highlighter. In which case it would be awesome if you could point me to the code.
- I saw only three classes for highlighters. Plain, Fast Vector and Unified. The document mentioned four. Would be great to have a reference to this last one.
- I continue to read the highlighters and understand them and will also look at 'low hanging fruit' to fix but I'd be glad for some expert advice that could help jumpstart my understanding faster.