Lucene 4.0

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it.

-shay.banon

--

Awesome, I can't wait!

On Thu, Oct 11, 2012 at 8:18 AM, Shay Banon kimchy@gmail.com wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we plan

to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in

the next major elasticsearch version. This will include upgrading to 4.0,
and making use of the new features and exposing them to the users. This
shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in
the wild" a bit before releasing a formal release (0.21) of elasticsearch
with it.

-shay.banon

--

--

Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could just
update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear
confirmation from you.

Thank you

On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we plan 

to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in 

the next major elasticsearch version. This will include upgrading to 4.0,
and making use of the new features and exposing them to the users. This
shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in
the wild" a bit before releasing a formal release (0.21) of elasticsearch
with it.

-shay.banon

--

Lucene is always backwards compatible within one version, so Lucene 4.0
code should be able to read a 3.x index. Writing indexes with a newer
version and reading them in an older one is more problematic. That said, a
new ES version is more than just a Lucene upgrade, so it might require a
reindex.

ES version incompatibility is always due to the internal communication API
changing.

--
Ivan

On Thu, Oct 11, 2012 at 11:20 AM, Eugene Strokin eugene@strokin.infowrote:

Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could
just update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear
confirmation from you.

Thank you

On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we

plan to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in

the next major elasticsearch version. This will include upgrading to 4.0,
and making use of the new features and exposing them to the users. This
shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in
the wild" a bit before releasing a formal release (0.21) of elasticsearch
with it.

-shay.banon

--

--

On 10/11/2012 1:40 PM, Ivan Brusic wrote:

Lucene is always backwards compatible within one version, so Lucene
4.0 code should be able to read a 3.x index. Writing indexes with a
newer version and reading them in an older one is more problematic.
That said, a new ES version is more than just a Lucene upgrade, so it
might require a reindex.

Here is the statement from the release notes

http://lucene.apache.org/core/4_0_0-BETA/changes/Changes.html#4.0.0-alpha.changes_in_backwards_compatibility_policy

"On upgrading to 4.0, if you do not fully reindex your documents, Lucene
will emulate the new flex API on top of the old index, incurring some
performance cost (up to ~10% slowdown, typically). To prevent this
slowdown, use oal.index.IndexUpgrader to upgrade your indexes to latest
file format (LUCENE-3082
http://issues.apache.org/jira/browse/LUCENE-3082)."

Thus, as an app developer we don't have to do the work, we can trigger
an upgrade of each Lucene index when we can using a conversion tool.

http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/index/IndexUpgrader.html

I'm sure this feature will be useful in ES.

-Paul

--

Thank you for the replies, It would be very nice feature if ES
automatically or by some command would upgrade the index to v.4.
Will be waiting for the new versions of ES.
Thanks again.

On Thursday, October 11, 2012 5:27:17 PM UTC-4, P Hill wrote:

On 10/11/2012 1:40 PM, Ivan Brusic wrote:

Lucene is always backwards compatible within one version, so Lucene
4.0 code should be able to read a 3.x index. Writing indexes with a
newer version and reading them in an older one is more problematic.
That said, a new ES version is more than just a Lucene upgrade, so it
might require a reindex.

Here is the statement from the release notes

Lucene Change Log

"On upgrading to 4.0, if you do not fully reindex your documents, Lucene
will emulate the new flex API on top of the old index, incurring some
performance cost (up to ~10% slowdown, typically). To prevent this
slowdown, use oal.index.IndexUpgrader to upgrade your indexes to latest
file format (LUCENE-3082
http://issues.apache.org/jira/browse/LUCENE-3082)."

Thus, as an app developer we don't have to do the work, we can trigger
an upgrade of each Lucene index when we can using a conversion tool.

IndexUpgrader (Lucene 4.0.0 API)

I'm sure this feature will be useful in ES.

-Paul

--

Heya, few points:

Indexes in elasticsearch has been "backward" compatible, and when upgrading to Lucene 4.0, it will be backward compatible as well (both on the Lucene level, and on the ES level). You might need to do an "upgrade" to make use of newer versions, or to make sure there is no "emulation" layer on Lucene level.

So, upgrading to 0.20 and 0.21 will be compatible index wise (no need to reindex), but you will need to do a full cluster restart (though we are working on eventually not needing that as well, first infrastructure for that is already going to be in upcoming 0.20).

On Oct 11, 2012, at 4:56 PM, Eugene Strokin eugene@strokin.info wrote:

Thank you for the replies, It would be very nice feature if ES automatically or by some command would upgrade the index to v.4.
Will be waiting for the new versions of ES.
Thanks again.

On Thursday, October 11, 2012 5:27:17 PM UTC-4, P Hill wrote:
On 10/11/2012 1:40 PM, Ivan Brusic wrote:

Lucene is always backwards compatible within one version, so Lucene
4.0 code should be able to read a 3.x index. Writing indexes with a
newer version and reading them in an older one is more problematic.
That said, a new ES version is more than just a Lucene upgrade, so it
might require a reindex.

Here is the statement from the release notes

Lucene Change Log

"On upgrading to 4.0, if you do not fully reindex your documents, Lucene
will emulate the new flex API on top of the old index, incurring some
performance cost (up to ~10% slowdown, typically). To prevent this
slowdown, use oal.index.IndexUpgrader to upgrade your indexes to latest
file format (LUCENE-3082
http://issues.apache.org/jira/browse/LUCENE-3082)."

Thus, as an app developer we don't have to do the work, we can trigger
an upgrade of each Lucene index when we can using a conversion tool.

IndexUpgrader (Lucene 4.0.0 API)

I'm sure this feature will be useful in ES.

-Paul

--

--

On Thursday, October 11, 2012 10:40:53 PM UTC+2, Ivan Brusic wrote:

Lucene is always backwards compatible within one version, so Lucene 4.0
code should be able to read a 3.x index. Writing indexes with a newer
version and reading them in an older one is more problematic. That said, a
new ES version is more than just a Lucene upgrade, so it might require a
reindex.

this question has been asked a couple of times so let me elaborate on this
a little for those who are interested what the main differences are between
4.0 and 3.x on the index level. It is correct in general that lucene 4.0
can read 3.0 indexes but this comes with a cost this time. in lucene 4.0 we
changed the sort order from UTF-16 to UTF-8 on the lowest level so 3.x
indices are sorted "differently". To make this still work with the 4.0 API
we added a "re-mapping" layer for surrogate characters to maintain the
correct sort order. This will have some cost in performance but it should
not be dramatic. Yet, it is still a good idea to upgrade the index. The
good news is lucene can by-itself do that in the background. If you merge a
3.x segment with 4.0 it will be merged into a 4.0 segment so you can
"upgrade-over-time". Anyhow, in practice ES users should be need this
knowledge and we will provide flexible ways to upgrade to an ES version
that runs lucene 4.0.

simon

ES version incompatibility is always due to the internal communication API
changing.

--
Ivan

On Thu, Oct 11, 2012 at 11:20 AM, Eugene Strokin <eug...@strokin.info<javascript:>

wrote:

Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could
just update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear
confirmation from you.

Thank you

On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we 

plan to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be 

in the next major elasticsearch version. This will include upgrading to
4.0, and making use of the new features and exposing them to the users.
This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out
in the wild" a bit before releasing a formal release (0.21) of
elasticsearch with it.

-shay.banon

--

--

Thank you Shay & Simon for giving us those details. Customers are already
wondering when Lucene 4 will be available in ES.

-- Tanguy
@tlrx

Le vendredi 12 octobre 2012 10:26:33 UTC+2, simonw a écrit :

On Thursday, October 11, 2012 10:40:53 PM UTC+2, Ivan Brusic wrote:

Lucene is always backwards compatible within one version, so Lucene 4.0
code should be able to read a 3.x index. Writing indexes with a newer
version and reading them in an older one is more problematic. That said, a
new ES version is more than just a Lucene upgrade, so it might require a
reindex.

this question has been asked a couple of times so let me elaborate on this
a little for those who are interested what the main differences are between
4.0 and 3.x on the index level. It is correct in general that lucene 4.0
can read 3.0 indexes but this comes with a cost this time. in lucene 4.0 we
changed the sort order from UTF-16 to UTF-8 on the lowest level so 3.x
indices are sorted "differently". To make this still work with the 4.0 API
we added a "re-mapping" layer for surrogate characters to maintain the
correct sort order. This will have some cost in performance but it should
not be dramatic. Yet, it is still a good idea to upgrade the index. The
good news is lucene can by-itself do that in the background. If you merge a
3.x segment with 4.0 it will be merged into a 4.0 segment so you can
"upgrade-over-time". Anyhow, in practice ES users should be need this
knowledge and we will provide flexible ways to upgrade to an ES version
that runs lucene 4.0.

simon

ES version incompatibility is always due to the internal communication
API changing.

--
Ivan

On Thu, Oct 11, 2012 at 11:20 AM, Eugene Strokin eug...@strokin.infowrote:

Shay, thanks for update.
Just wander about back compatibility.
If switching from 0.19 to 0.20 will it require reindexing, or we could
just update the libs?
And from 20 to 21, I guess it will require reindexing, but want to hear
confirmation from you.

Thank you

On Thursday, October 11, 2012 11:18:46 AM UTC-4, kimchy wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we 

plan to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be 

in the next major elasticsearch version. This will include upgrading to
4.0, and making use of the new features and exposing them to the users.
This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out
in the wild" a bit before releasing a formal release (0.21) of
elasticsearch with it.

-shay.banon

--

--

So, upgrading to 0.20 and 0.21 will be compatible index wise (no need
to reindex), but you will need to do a full cluster restart (though we
are working on eventually not needing that as well, first
infrastructure for that is already going to be in upcoming 0.20).

This is excellent news!

--

Lucene 4.0 was officially released today.

On Thu, Oct 11, 2012 at 8:40 AM, Matt Weber matt.weber@gmail.com wrote:

Awesome, I can't wait!

On Thu, Oct 11, 2012 at 8:18 AM, Shay Banon kimchy@gmail.com wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we

plan to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in

the next major elasticsearch version. This will include upgrading to 4.0,
and making use of the new features and exposing them to the users. This
shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in
the wild" a bit before releasing a formal release (0.21) of elasticsearch
with it.

-shay.banon

--

--

--

Hi Shay,

Lucene 4 just got released :slight_smile:

I would love to get some pointers what kind of help of the community is
most welcome?

For example, with Lucene 4.0, will there be an
elasticsearch-index-spellcheck module? How about introducing modules for
codecs? How will support for Lucene payloads look like in ES? Just to
mention a few exciting things...

Best regards,

Jörg

On Thursday, October 11, 2012 5:18:46 PM UTC+2, kimchy wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we plan 

to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in 

the next major elasticsearch version. This will include upgrading to 4.0,
and making use of the new features and exposing them to the users. This
shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in
the wild" a bit before releasing a formal release (0.21) of elasticsearch
with it.

-shay.banon

--

The plan is the first get Lucene 4.0 integrated with elasticsearch, and then expose all the new features. We will take it feature by feature, but to your points, there will be a spellcheck builtin using the new "direct" spellcheck feature, you will be able to configure codecs in the mapping, and write a plugin that introduces new codes, and so on...

On Oct 12, 2012, at 8:54 AM, Jörg Prante joergprante@gmail.com wrote:

Hi Shay,

Lucene 4 just got released :slight_smile:

I would love to get some pointers what kind of help of the community is most welcome?

For example, with Lucene 4.0, will there be an elasticsearch-index-spellcheck module? How about introducing modules for codecs? How will support for Lucene payloads look like in ES? Just to mention a few exciting things...

Best regards,

Jörg

On Thursday, October 11, 2012 5:18:46 PM UTC+2, kimchy wrote:
Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we plan to upgrade to it on elasticsearch. The plan is to first release 0.20 version still using Lucene 3.6.x, this should happen in the next week or so. The reason is that we want to get 0.20 features at the hand of users as fast as possible, without waiting for the upgrade to 4.0. 

We will also start the process of upgrading the 4.0, which will be in the next major elasticsearch version. This will include upgrading to 4.0, and making use of the new features and exposing them to the users. This shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in the wild" a bit before releasing a formal release (0.21) of elasticsearch with it. 

-shay.banon

--

--

Lucid Imagination has an annotated version of the “Release Highlights” for
Lucene/Solr 4.0

http://searchhub.org/dev/2012/10/12/apache-solr-and-lucene-4-0-0-released/

Andrzej Białecki, Robert Muir, and Grant Ingersoll will be presenting a
paper on the Lucene 4 architecture:
http://opensearchlab.otago.ac.nz/paper_10.pdf

Good reading if you like to understand the guts.

--
Ivan

On Fri, Oct 12, 2012 at 8:54 AM, Jörg Prante joergprante@gmail.com wrote:

Hi Shay,

Lucene 4 just got released :slight_smile:

I would love to get some pointers what kind of help of the community is
most welcome?

For example, with Lucene 4.0, will there be an
elasticsearch-index-spellcheck module? How about introducing modules for
codecs? How will support for Lucene payloads look like in ES? Just to
mention a few exciting things...

Best regards,

Jörg

On Thursday, October 11, 2012 5:18:46 PM UTC+2, kimchy wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we

plan to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in

the next major elasticsearch version. This will include upgrading to 4.0,
and making use of the new features and exposing them to the users. This
shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in
the wild" a bit before releasing a formal release (0.21) of elasticsearch
with it.

-shay.banon

--

--

Shout out on techcrunch!

On Thursday, October 11, 2012 8:18:46 AM UTC-7, kimchy wrote:

Heya fellows,

Lucene 4.0 is out the door, wanted to update you all about how we plan 

to upgrade to it on elasticsearch. The plan is to first release 0.20
version still using Lucene 3.6.x, this should happen in the next week or
so. The reason is that we want to get 0.20 features at the hand of users as
fast as possible, without waiting for the upgrade to 4.0.

We will also start the process of upgrading the 4.0, which will be in 

the next major elasticsearch version. This will include upgrading to 4.0,
and making use of the new features and exposing them to the users. This
shouldn't take too long, though we do want to see 4.0.0 GA Lucene "out in
the wild" a bit before releasing a formal release (0.21) of elasticsearch
with it.

-shay.banon

--

Can i expect to see the grouping feature in 0.20 ? Today i worked with a
0.19.4 version of ES, with Martijn code for grouping and some custom patchs
to use more facet types than just terms string. And i really want to stop
maintaining custom version of ES :slight_smile:

Or need i to wait for the next 0.21 based on new Lucene 4.0 ?

Thx in advance,

--
Nicolas BLANC.

--

We would like to plan for .21 whats a realistic time table, thx. great work!

--

Still waiting for .20 to be released! :slight_smile:

On Tue, Oct 23, 2012 at 12:22 PM, Aaron Rosenthal dealertouch@gmail.comwrote:

We would like to plan for .21 whats a realistic time table, thx. great
work!

--

--

0.20.0.RC1 was already released, working on the blog post now.

On Oct 23, 2012, at 9:48 PM, Ivan Brusic ivan@brusic.com wrote:

Still waiting for .20 to be released! :slight_smile:

On Tue, Oct 23, 2012 at 12:22 PM, Aaron Rosenthal dealertouch@gmail.com wrote:
We would like to plan for .21 whats a realistic time table, thx. great work!

--

--

--

I'm really looking forward to this. Good job for you guys, keep moving
forward!

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

2012/10/24 kimchy@gmail.com

0.20.0.RC1 was already released, working on the blog post now.

On Oct 23, 2012, at 9:48 PM, Ivan Brusic ivan@brusic.com wrote:

Still waiting for .20 to be released! :slight_smile:

On Tue, Oct 23, 2012 at 12:22 PM, Aaron Rosenthal dealertouch@gmail.comwrote:

We would like to plan for .21 whats a realistic time table, thx. great
work!

--

--

--

--