Inconsistent responses from aggregations (ES1.0.0RC1)

On Wed, Feb 5, 2014 at 6:42 PM, Nils Dijk me@thanod.nl wrote:

Ok, I was preparing to do a long bisecting session, but I started with the
commit you highlighted below (4271d573d60f39564c458e2d3fb7c14afb82d4d8)
and the commit before that one (6481a2fde858520988f2ce28c02a15be3fe108e4).
And as it turns out, it is the breaking commit.

If I build the commit of yours from December 3 it fails my test suite.
If I build the commit of Nik from Januari 6 it still passes my test.

I also tried reverting your commit on the v1.0.0.RC1 tag, but it gave me
all kinds of conflicts so I could not test RC1 without your commit.

If you would like I can still do a full bisect, but I suspect I end up at
your commit since I tested that one, and the one before.

Would it be possible for you to send a .patch without the unsafe stuff, so
I can apply that to a commit and make a build?

Thanks Nils for your work, this is much appreciated.

Here is a simple patch attached that short-circuits the use of Unsafe to do
string comparisons.

Maybe you could also try to set the cache.recycler.page.type setting to
none to see if that changes anything.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4BHXO%2B45m%2B7T0q83rhWd34RPhwWf8fWdu%2BzgL_Yu96eQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

OK, I finally managed to reproduce it on both mac and linux by increasing
the number of shards to 20, will keep you posted

On Thu, Feb 6, 2014 at 9:29 AM, Adrien Grand <adrien.grand@elasticsearch.com

wrote:

On Wed, Feb 5, 2014 at 6:42 PM, Nils Dijk me@thanod.nl wrote:

Ok, I was preparing to do a long bisecting session, but I started with
the commit you highlighted below (4271d573d60f39564c458e2d3fb7c14afb82d4d8)
and the commit before that one (6481a2fde858520988f2ce28c02a15be3fe108e4).
And as it turns out, it is the breaking commit.

If I build the commit of yours from December 3 it fails my test suite.
If I build the commit of Nik from Januari 6 it still passes my test.

I also tried reverting your commit on the v1.0.0.RC1 tag, but it gave me
all kinds of conflicts so I could not test RC1 without your commit.

If you would like I can still do a full bisect, but I suspect I end up at
your commit since I tested that one, and the one before.

Would it be possible for you to send a .patch without the unsafe stuff,
so I can apply that to a commit and make a build?

Thanks Nils for your work, this is much appreciated.

Here is a simple patch attached that short-circuits the use of Unsafe to
do string comparisons.

Maybe you could also try to set the cache.recycler.page.type setting to
none to see if that changes anything.

--
Adrien Grand

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j54GJXhR1RWTH-emJJhYhHn%2B1h-akHamXs8KFDqfYxyUA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Good,

It is always easier to fix when it's on your own machine.

I tried your .patch, but it did not fix the problem. I also tried your
config, although I did not really get where to put the setting, I ended up
putting the setting on the index. This also did not fix the problem.

I also tried with a bigger shard_size in the agg. Yet again no difference.

To test some more around aggs I loaded a complete production set into both
my local ES RC2 (osx) and one on a linux server with ES RC2. I have a hunch
it could be in the sorting of the terms. When I do a sub agg and sort on it
I see all kind of weird results that are even lower than the ones I see
when I do not sort on the sub agg.

If you need me to test some more I am keeping a close watch on this thread.

-- Nils

On Thursday, February 6, 2014 1:19:40 PM UTC+1, Adrien Grand wrote:

OK, I finally managed to reproduce it on both mac and linux by increasing
the number of shards to 20, will keep you posted

On Thu, Feb 6, 2014 at 9:29 AM, Adrien Grand <adrien...@elasticsearch.com<javascript:>

wrote:

On Wed, Feb 5, 2014 at 6:42 PM, Nils Dijk <m...@thanod.nl <javascript:>>wrote:

Ok, I was preparing to do a long bisecting session, but I started with
the commit you highlighted below (4271d573d60f39564c458e2d3fb7c14afb82d4d8)
and the commit before that one (6481a2fde858520988f2ce28c02a15be3fe108e4).
And as it turns out, it is the breaking commit.

If I build the commit of yours from December 3 it fails my test suite.
If I build the commit of Nik from Januari 6 it still passes my test.

I also tried reverting your commit on the v1.0.0.RC1 tag, but it gave me
all kinds of conflicts so I could not test RC1 without your commit.

If you would like I can still do a full bisect, but I suspect I end up
at your commit since I tested that one, and the one before.

Would it be possible for you to send a .patch without the unsafe stuff,
so I can apply that to a commit and make a build?

Thanks Nils for your work, this is much appreciated.

Here is a simple patch attached that short-circuits the use of Unsafe to
do string comparisons.

Maybe you could also try to set the cache.recycler.page.type setting to
none to see if that changes anything.

--
Adrien Grand

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af8e91d8-4a97-42d3-9dd5-8a980ded493e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It took me some time but I finally managed to understand the cause and to
write a fix:
Fix BytesRef owning issue in string terms aggregations. by jpountz · Pull Request #5039 · elastic/elasticsearch · GitHub

Thanks very much for reporting this and for your help reproducing and
debugging this issue!

On Thu, Feb 6, 2014 at 2:08 PM, Nils Dijk me@thanod.nl wrote:

Good,

It is always easier to fix when it's on your own machine.

I tried your .patch, but it did not fix the problem. I also tried your
config, although I did not really get where to put the setting, I ended up
putting the setting on the index. This also did not fix the problem.

I also tried with a bigger shard_size in the agg. Yet again no difference.

To test some more around aggs I loaded a complete production set into both
my local ES RC2 (osx) and one on a linux server with ES RC2. I have a hunch
it could be in the sorting of the terms. When I do a sub agg and sort on it
I see all kind of weird results that are even lower than the ones I see
when I do not sort on the sub agg.

If you need me to test some more I am keeping a close watch on this thread.

-- Nils

On Thursday, February 6, 2014 1:19:40 PM UTC+1, Adrien Grand wrote:

OK, I finally managed to reproduce it on both mac and linux by increasing
the number of shards to 20, will keep you posted

On Thu, Feb 6, 2014 at 9:29 AM, Adrien Grand <adrien...@elasticsearch.com

wrote:

On Wed, Feb 5, 2014 at 6:42 PM, Nils Dijk m...@thanod.nl wrote:

Ok, I was preparing to do a long bisecting session, but I started with
the commit you highlighted below (4271d573d60f39564c458e2d3fb7c14afb82d4d8)
and the commit before that one (6481a2fde858520988f2ce28c02a1
5be3fe108e4). And as it turns out, it is the breaking commit.

If I build the commit of yours from December 3 it fails my test suite.
If I build the commit of Nik from Januari 6 it still passes my test.

I also tried reverting your commit on the v1.0.0.RC1 tag, but it gave
me all kinds of conflicts so I could not test RC1 without your commit.

If you would like I can still do a full bisect, but I suspect I end up
at your commit since I tested that one, and the one before.

Would it be possible for you to send a .patch without the unsafe stuff,
so I can apply that to a commit and make a build?

Thanks Nils for your work, this is much appreciated.

Here is a simple patch attached that short-circuits the use of Unsafe to
do string comparisons.

Maybe you could also try to set the cache.recycler.page.type setting
to none to see if that changes anything.

--
Adrien Grand

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/af8e91d8-4a97-42d3-9dd5-8a980ded493e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7TRxpKpbVmUUGo5q36wUhDSYLkaXGEiM%3D4rVK4PzNd_g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yay!

I will try this somewhere tomorrow. Thanks for fixing, much appreciated!

Seems like it was difficult to find. Since it only happens when a 'page'
gets recycled internally.

On Thursday, February 6, 2014 3:53:46 PM UTC+1, Adrien Grand wrote:

It took me some time but I finally managed to understand the cause and to
write a fix:
Fix BytesRef owning issue in string terms aggregations. by jpountz · Pull Request #5039 · elastic/elasticsearch · GitHub

Thanks very much for reporting this and for your help reproducing and
debugging this issue!

On Thu, Feb 6, 2014 at 2:08 PM, Nils Dijk <m...@thanod.nl <javascript:>>wrote:

Good,

It is always easier to fix when it's on your own machine.

I tried your .patch, but it did not fix the problem. I also tried your
config, although I did not really get where to put the setting, I ended up
putting the setting on the index. This also did not fix the problem.

I also tried with a bigger shard_size in the agg. Yet again no difference.

To test some more around aggs I loaded a complete production set into
both my local ES RC2 (osx) and one on a linux server with ES RC2. I have a
hunch it could be in the sorting of the terms. When I do a sub agg and sort
on it I see all kind of weird results that are even lower than the ones I
see when I do not sort on the sub agg.

If you need me to test some more I am keeping a close watch on this
thread.

-- Nils

On Thursday, February 6, 2014 1:19:40 PM UTC+1, Adrien Grand wrote:

OK, I finally managed to reproduce it on both mac and linux by
increasing the number of shards to 20, will keep you posted

On Thu, Feb 6, 2014 at 9:29 AM, Adrien Grand <adrien...@elasticsearch.
com> wrote:

On Wed, Feb 5, 2014 at 6:42 PM, Nils Dijk m...@thanod.nl wrote:

Ok, I was preparing to do a long bisecting session, but I started with
the commit you highlighted below (4271d573d60f39564c458e2d3fb7c14afb82d4d8)
and the commit before that one (6481a2fde858520988f2ce28c02a1
5be3fe108e4). And as it turns out, it is the breaking commit.

If I build the commit of yours from December 3 it fails my test suite.
If I build the commit of Nik from Januari 6 it still passes my test.

I also tried reverting your commit on the v1.0.0.RC1 tag, but it gave
me all kinds of conflicts so I could not test RC1 without your commit.

If you would like I can still do a full bisect, but I suspect I end up
at your commit since I tested that one, and the one before.

Would it be possible for you to send a .patch without the unsafe
stuff, so I can apply that to a commit and make a build?

Thanks Nils for your work, this is much appreciated.

Here is a simple patch attached that short-circuits the use of Unsafe
to do string comparisons.

Maybe you could also try to set the cache.recycler.page.type setting
to none to see if that changes anything.

--
Adrien Grand

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/af8e91d8-4a97-42d3-9dd5-8a980ded493e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0e604ec8-05a8-4697-b6bf-28d8bda756ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Adrien,

Good news! The problem is solved.
Can't wait for the release containing the fix, but for now I will use my
own build :slight_smile:

On Thursday, February 6, 2014 5:25:11 PM UTC+1, Nils Dijk wrote:

Yay!

I will try this somewhere tomorrow. Thanks for fixing, much appreciated!

Seems like it was difficult to find. Since it only happens when a 'page'
gets recycled internally.

On Thursday, February 6, 2014 3:53:46 PM UTC+1, Adrien Grand wrote:

It took me some time but I finally managed to understand the cause and to
write a fix:
Fix BytesRef owning issue in string terms aggregations. by jpountz · Pull Request #5039 · elastic/elasticsearch · GitHub

Thanks very much for reporting this and for your help reproducing and
debugging this issue!

On Thu, Feb 6, 2014 at 2:08 PM, Nils Dijk m...@thanod.nl wrote:

Good,

It is always easier to fix when it's on your own machine.

I tried your .patch, but it did not fix the problem. I also tried your
config, although I did not really get where to put the setting, I ended up
putting the setting on the index. This also did not fix the problem.

I also tried with a bigger shard_size in the agg. Yet again no
difference.

To test some more around aggs I loaded a complete production set into
both my local ES RC2 (osx) and one on a linux server with ES RC2. I have a
hunch it could be in the sorting of the terms. When I do a sub agg and sort
on it I see all kind of weird results that are even lower than the ones I
see when I do not sort on the sub agg.

If you need me to test some more I am keeping a close watch on this
thread.

-- Nils

On Thursday, February 6, 2014 1:19:40 PM UTC+1, Adrien Grand wrote:

OK, I finally managed to reproduce it on both mac and linux by
increasing the number of shards to 20, will keep you posted

On Thu, Feb 6, 2014 at 9:29 AM, Adrien Grand <adrien...@elasticsearch.
com> wrote:

On Wed, Feb 5, 2014 at 6:42 PM, Nils Dijk m...@thanod.nl wrote:

Ok, I was preparing to do a long bisecting session, but I started
with the commit you highlighted below (4271d573d60f39564c458e2d3fb7c14afb82d4d8)
and the commit before that one (6481a2fde858520988f2ce28c02a1
5be3fe108e4). And as it turns out, it is the breaking commit.

If I build the commit of yours from December 3 it fails my test suite.
If I build the commit of Nik from Januari 6 it still passes my test.

I also tried reverting your commit on the v1.0.0.RC1 tag, but it gave
me all kinds of conflicts so I could not test RC1 without your commit.

If you would like I can still do a full bisect, but I suspect I end
up at your commit since I tested that one, and the one before.

Would it be possible for you to send a .patch without the unsafe
stuff, so I can apply that to a commit and make a build?

Thanks Nils for your work, this is much appreciated.

Here is a simple patch attached that short-circuits the use of Unsafe
to do string comparisons.

Maybe you could also try to set the cache.recycler.page.type setting
to none to see if that changes anything.

--
Adrien Grand

--
Adrien Grand

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/af8e91d8-4a97-42d3-9dd5-8a980ded493e%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c6acf6ac-3f47-49e4-8240-57c4c697c635%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Excellent news, thanks for checking! RC2 was the last release candidate, so
the next release containing the fix should be 1.0 GA. Hopefully it will be
out soon.

On Fri, Feb 7, 2014 at 12:39 PM, Nils Dijk me@thanod.nl wrote:

Hi Adrien,

Good news! The problem is solved.
Can't wait for the release containing the fix, but for now I will use my
own build :slight_smile:

On Thursday, February 6, 2014 5:25:11 PM UTC+1, Nils Dijk wrote:

Yay!

I will try this somewhere tomorrow. Thanks for fixing, much appreciated!

Seems like it was difficult to find. Since it only happens when a 'page'
gets recycled internally.

On Thursday, February 6, 2014 3:53:46 PM UTC+1, Adrien Grand wrote:

It took me some time but I finally managed to understand the cause and
to write a fix:
Fix BytesRef owning issue in string terms aggregations. by jpountz · Pull Request #5039 · elastic/elasticsearch · GitHub

Thanks very much for reporting this and for your help reproducing and
debugging this issue!

On Thu, Feb 6, 2014 at 2:08 PM, Nils Dijk m...@thanod.nl wrote:

Good,

It is always easier to fix when it's on your own machine.

I tried your .patch, but it did not fix the problem. I also tried your
config, although I did not really get where to put the setting, I ended up
putting the setting on the index. This also did not fix the problem.

I also tried with a bigger shard_size in the agg. Yet again no
difference.

To test some more around aggs I loaded a complete production set into
both my local ES RC2 (osx) and one on a linux server with ES RC2. I have a
hunch it could be in the sorting of the terms. When I do a sub agg and sort
on it I see all kind of weird results that are even lower than the ones I
see when I do not sort on the sub agg.

If you need me to test some more I am keeping a close watch on this
thread.

-- Nils

On Thursday, February 6, 2014 1:19:40 PM UTC+1, Adrien Grand wrote:

OK, I finally managed to reproduce it on both mac and linux by
increasing the number of shards to 20, will keep you posted

On Thu, Feb 6, 2014 at 9:29 AM, Adrien Grand <adrien...@elasticsearch.
com> wrote:

On Wed, Feb 5, 2014 at 6:42 PM, Nils Dijk m...@thanod.nl wrote:

Ok, I was preparing to do a long bisecting session, but I started
with the commit you highlighted below (4271d573d60f39564c458e2d3fb7c
14afb82d4d8) and the commit before that one (
6481a2fde858520988f2ce28c02a15be3fe108e4). And as it turns out, it
is the breaking commit.

If I build the commit of yours from December 3 it fails my test
suite.
If I build the commit of Nik from Januari 6 it still passes my test.

I also tried reverting your commit on the v1.0.0.RC1 tag, but it
gave me all kinds of conflicts so I could not test RC1 without your commit.

If you would like I can still do a full bisect, but I suspect I end
up at your commit since I tested that one, and the one before.

Would it be possible for you to send a .patch without the unsafe
stuff, so I can apply that to a commit and make a build?

Thanks Nils for your work, this is much appreciated.

Here is a simple patch attached that short-circuits the use of Unsafe
to do string comparisons.

Maybe you could also try to set the cache.recycler.page.type
setting to none to see if that changes anything.

--
Adrien Grand

--
Adrien Grand

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/af8e91d8-4a97-42d3-9dd5-8a980ded493e%
40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c6acf6ac-3f47-49e4-8240-57c4c697c635%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6K8VbCDfz7cz8v7qR3k3e8afQdULGBfDDsBL%2B%2BqGLOjw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.