Something's wrong with the indexing performance

Igor_Simonov · October 18, 2012, 4:00pm

Hello,

we have been using 0.16 for a long time with 20GB index and with no
problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a
problem we didn't have before. We don't experiment with settings much, so
we have everything default and only adjust memory settings (900MB heap size
on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing
performance. The best what we could get is the bulk indexing by 100-doc
portions in a pure test setup (one server, one shard, 0 replicas, no
network involved, curl to localhost). Time spent for each 100-doc chunk
(below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~
200MB of RAM usage.
We are out of ideas
Must be something simple, maybe related to upgrade.. Some settings?

--

CatalinC · October 19, 2012, 11:04pm

Have you checked the mappings? I also noticed that with default settings,
for arrays if I well remember, the mappings were growing with every insert.

On Thursday, October 18, 2012 5:00:42 PM UTC+1, Igor Simonov wrote:

Hello,

we have been using 0.16 for a long time with 20GB index and with no
problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a
problem we didn't have before. We don't experiment with settings much, so
we have everything default and only adjust memory settings (900MB heap size
on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing
performance. The best what we could get is the bulk indexing by 100-doc
portions in a pure test setup (one server, one shard, 0 replicas, no
network involved, curl to localhost). Time spent for each 100-doc chunk
(below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~
200MB of RAM usage.
We are out of ideas
Must be something simple, maybe related to upgrade.. Some settings?

--

kimchy · October 20, 2012, 10:39pm

Thats strange, it shouldn't grow like this. Can you make a simple experiment (even on your laptop) and see if you get the same behavior? Lets take the VPS part out of the equation first.

On Oct 18, 2012, at 6:00 PM, Igor Simonov igor.simonov@gmail.com wrote:

Hello,

we have been using 0.16 for a long time with 20GB index and with no problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a problem we didn't have before. We don't experiment with settings much, so we have everything default and only adjust memory settings (900MB heap size on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing performance. The best what we could get is the bulk indexing by 100-doc portions in a pure test setup (one server, one shard, 0 replicas, no network involved, curl to localhost). Time spent for each 100-doc chunk (below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~ 200MB of RAM usage.
We are out of ideas
Must be something simple, maybe related to upgrade.. Some settings?

--

--

Igor_Simonov · October 21, 2012, 5:27pm

Thank you for the replies!

Yes, it was perfectly reproduceable locally.
I say 'was' because we have resolved the issue, though I can't say I
understand its nature very well. It happens

There was a nested object, kind of latest addition. Not very big, but
3-level. It turned out that we needed it there just to avoid storing it
somewhere else, so it didn't have to be indexed at all. So we disabled
indexing of it, and the problem's gone. It takes some milliseconds now.

CatalinC: yes, sounds similar. No arrays there, but indexing of that nested
object was generating a lot of mappings.

On Sunday, October 21, 2012 12:39:01 AM UTC+2, kimchy wrote:

Thats strange, it shouldn't grow like this. Can you make a simple
experiment (even on your laptop) and see if you get the same behavior? Lets
take the VPS part out of the equation first.

On Oct 18, 2012, at 6:00 PM, Igor Simonov <igor.s...@gmail.com<javascript:>>
wrote:

Hello,

we have been using 0.16 for a long time with 20GB index and with no
problems (well, in general).
Now we have 0.19.7 for a new cluster and it looks like we've run into a
problem we didn't have before. We don't experiment with settings much, so
we have everything default and only adjust memory settings (900MB heap size
on fully dedicated 1024MB VPS, no swapping there, so seems OK).

The problem is that no matter what we try, we get ridiculous indexing
performance. The best what we could get is the bulk indexing by 100-doc
portions in a pure test setup (one server, one shard, 0 replicas, no
network involved, curl to localhost). Time spent for each 100-doc chunk
(below) is a) huge b) growing significally for each next portion.

real 6.89
real 9.04
real 15.71
real 27.72
real 45.88
real 68.80
real 74.53
real 79.87
real 80.87
real 100.53

The strangest part is that feeding that 1,5MB JSON file to ES may add ~
200MB of RAM usage.
We are out of ideas
Must be something simple, maybe related to upgrade.. Some settings?

--

--

Topic		Replies	Views
Rapidly Degrading Bulk Indexing Performance Elasticsearch	7	406	July 6, 2017
Index speed degradation Elasticsearch	7	477	July 6, 2017
Abrupt performance drop above a certain index size Elasticsearch	16	1480	July 6, 2017
Memory and index size dependency - performance problems Elasticsearch	10	1100	July 6, 2017
Ultra-slow indexing Elasticsearch	12	833	July 6, 2017

Something's wrong with the indexing performance

Related topics