Something's wrong with the indexing performance


(Igor Simonov) #1

Hello,

we have been using 0.16 for a long time with 20GB index and with no
problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a
problem we didn't have before. We don't experiment with settings much, so
we have everything default and only adjust memory settings (900MB heap size
on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing
performance. The best what we could get is the bulk indexing by 100-doc
portions in a pure test setup (one server, one shard, 0 replicas, no
network involved, curl to localhost). Time spent for each 100-doc chunk
(below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~
200MB of RAM usage.
We are out of ideas :slight_smile:
Must be something simple, maybe related to upgrade.. Some settings?

--


(CatalinC) #2

Have you checked the mappings? I also noticed that with default settings,
for arrays if I well remember, the mappings were growing with every insert.

On Thursday, October 18, 2012 5:00:42 PM UTC+1, Igor Simonov wrote:

Hello,

we have been using 0.16 for a long time with 20GB index and with no
problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a
problem we didn't have before. We don't experiment with settings much, so
we have everything default and only adjust memory settings (900MB heap size
on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing
performance. The best what we could get is the bulk indexing by 100-doc
portions in a pure test setup (one server, one shard, 0 replicas, no
network involved, curl to localhost). Time spent for each 100-doc chunk
(below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~
200MB of RAM usage.
We are out of ideas :slight_smile:
Must be something simple, maybe related to upgrade.. Some settings?

--


(Shay Banon) #3

Thats strange, it shouldn't grow like this. Can you make a simple experiment (even on your laptop) and see if you get the same behavior? Lets take the VPS part out of the equation first.

On Oct 18, 2012, at 6:00 PM, Igor Simonov igor.simonov@gmail.com wrote:

Hello,

we have been using 0.16 for a long time with 20GB index and with no problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a problem we didn't have before. We don't experiment with settings much, so we have everything default and only adjust memory settings (900MB heap size on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing performance. The best what we could get is the bulk indexing by 100-doc portions in a pure test setup (one server, one shard, 0 replicas, no network involved, curl to localhost). Time spent for each 100-doc chunk (below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~ 200MB of RAM usage.
We are out of ideas :slight_smile:
Must be something simple, maybe related to upgrade.. Some settings?

--

--


(Igor Simonov) #4

Thank you for the replies!

Yes, it was perfectly reproduceable locally.
I say 'was' because we have resolved the issue, though I can't say I
understand its nature very well. It happens :slight_smile:

There was a nested object, kind of latest addition. Not very big, but
3-level. It turned out that we needed it there just to avoid storing it
somewhere else, so it didn't have to be indexed at all. So we disabled
indexing of it, and the problem's gone. It takes some milliseconds now.

CatalinC: yes, sounds similar. No arrays there, but indexing of that nested
object was generating a lot of mappings.

On Sunday, October 21, 2012 12:39:01 AM UTC+2, kimchy wrote:

Thats strange, it shouldn't grow like this. Can you make a simple
experiment (even on your laptop) and see if you get the same behavior? Lets
take the VPS part out of the equation first.

On Oct 18, 2012, at 6:00 PM, Igor Simonov <igor.s...@gmail.com<javascript:>>
wrote:

Hello,

we have been using 0.16 for a long time with 20GB index and with no
problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a
problem we didn't have before. We don't experiment with settings much, so
we have everything default and only adjust memory settings (900MB heap size
on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing
performance. The best what we could get is the bulk indexing by 100-doc
portions in a pure test setup (one server, one shard, 0 replicas, no
network involved, curl to localhost). Time spent for each 100-doc chunk
(below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~
200MB of RAM usage.
We are out of ideas :slight_smile:
Must be something simple, maybe related to upgrade.. Some settings?

--

--


(system) #5