Will Lucene 6.2 be part of the 5.0 release??
I'm mentioning this because I just noticed that https://issues.apache.org/jira/browse/LUCENE-2605 has been resolved a couple of days ago. Would really love to see this coming to Elasticsearch asap!
It's been 2 months since my last post on this topic and I was wondering if you maybe have an update on this?
I need the fix for a project I'm working on.
If it's not going to be a part of 5.0.0 / 2.7 , do you have any suggestions on how to get this fix in at least locally (if possible)?
I can almost guarantee 2.x won't be upgraded to Lucene 6. We only upgrade Lucene major versions when the ES major version also bumps.
Lucene guarantees one major version of backwards compatibility. So that means 2.x clusters (which are on Lucene 5) can read Lucene 4 segments. If we introduced Lucene 6 to 2.x, there may be old segments that suddenly become unreadable despite ES not introducing a major version bump.
So we only bump Lucene majors when ES majors bump, to coalesce the bwc breaks to just major versions
Since we're in a state right now that we would like to move to production with our application soon, waiting for 5.0.0 and getting it through the entire ecosystem once it's out is gonna take too long for us.
In order to get this fix in, do you think we should wait? Or fork the project and bump Lucene ourselves? Any thoughts??
Yeah, it's tricky spot to be in. Beta1 should be out soonish (I forget the exact date, but soon). Then RC after that, or potentially a second beta.
TBH, I would be suuuuper cautious forking and bumping Lucene yourself. Even ignoring the work involved in upgrading from 5.5 to 6.2, you could easily find yourself in a situation where there's an incompatibility between your fork and official ES. Which would put you in a bad spot if/when you want to upgrade, you may have to reindex everything, etc because the incompatibility is not upgradeable.
At that point, you'd be "safer" running off alpha or beta.
What kind of problems are you running into? I think it'd be much easier to work around the query_string quirks. E.g. identify the problematic fields and exclude them from querying via query_string, and instead add extra, specific queries against those fields in a big bool. Or use a multi_match, etc.
@byronvoorbach I don't think you need the whole machinery of Lucene 6+ just in order to fix a relatively small issue about whitespace tokenization in query_string. You should know that Lucene's query string syntax is broken inherently for a long time. Just look at Elasticsearch simple_query_string, which is a sane query parser to get rid of the quirks of query_string. You can write an ES plugin with a modified query_string parser in Elasticsearch to achieve your goal, if simple_query_string does not fit your purpose.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.