Memory problems during data index


(Zaharije) #1

Hi

we are doing single large index creation via python script (there is sample
code at the end of email) but after few millions of documents we are
getting slowdown and "Long running GC". It seems that we are having
similar problem as
http://elasticsearch-users.115913.n3.nabble.com/lack-of-memory-td762199.html#a762199,
but i did not found any solution in that mail.

we are having 8 node cluster (8GB of ram). Also we are using 24 shards
and 2 replicas. Actual numbers is rough estimate for 1bilion of
documents but test failed far from that number (also question is, is
that large number of shards x replicas too large? and how ES handles
per-shard index - does it need to be whole in memory?).

We used following configuration:

cluster:
name: ES-test

discovery:
type: jgroups
jgroups:
config: tcp
bind_port: 7800
bind_addr: katta
tcpping:
initial_hosts:
katta[7800],k00[7800],k01[7800],k02[7800],k03[7800],k04[7800],k06[7800],k07[7800]

gateway.fs.location: /search-sharing
gateway.type: fs

NFS is used for gateway.

Here is some dump from DEBUG log:

[11:18:39,215][INFO ][cluster.metadata ] [Gaia] Index
[users0]: Update mapping [id120071] (dynamic)
[11:18:52,367][WARN ][monitor.jvm ] [Gaia] Long GC
collection occurred, took [13s], breached threshold
[10s][11:19:04,513][WARN ][monitor.jvm ] [Gaia] Long GC
collection occurred, took [12s], breached threshold [10s]
[11:19:11,453][WARN ][jgroups.FC ] Received two credit
requests from k07-63227 without any intervening messages; sen
ding 1981561 credits[11:19:23,380][WARN ][monitor.jvm ]
[Gaia] Long GC collection occurred, took [18.7s], breached threshold
[10s][11:19:23,380][DEBUG][action.index ] [Gaia]
[users0][4], Node[katta-26574], [P], S[STARTED]: Failed to execute
[[users
0][id660070][4729da78-2392-4c41-9534-957be5ba1984],
source[{"air_class": "coach", "count": "true", "anual_income": 4314,
"hotel":
"starwood", "zipcode": 94365, "sex": "female", "net_worth":
64362}]]java.lang.NullPointerException
at org.elasticsearch.index.mapper.xcontent.XContentNumberFieldMapper$CachedNumericTokenStream.close(XContentNumberFieldMap
per.java:216)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:196)
at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:752)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1932)
at org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:191)
at org.elasticsearch.index.shard.service.InternalIndexShard.innerCreate(InternalIndexShard.java:222)
at org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:210)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:127)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:56)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:328)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.access$400(TransportShardReplicationOperationAction.java:198)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(T
ransportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[11:19:28,071][WARN ][jgroups.FD ] I was suspected by
k04-49735; ignoring the SUSPECT message and sending back a HEA
RTBEAT_ACK[11:19:42,368][WARN ][monitor.jvm ] [Gaia] Long
GC collection occurred, took [18.7s], breached threshold [10s]

And after this there are more and more of the same message.

Here is sample of app which is doing indexing.

################################################################################
def createDocuments(a_index, a_type, a_cfg):
dGen = dataGenerator.dataGenerator()
dGen.init('data/fields.txt')
nodes = getNodeList(a_cfg)
node = dGen.getRandomListMember(nodes)
address = node['address']
port = node['port']
url = address + ':' + str(port)
numOfDoc = int(a_cfg.getNode('create/indexes/numOfDocuments').text)
numOfRet = int(a_cfg.getNode('create/indexes/numOfRetries').text)

idx = ElasticSearch(url)

success = 0
errors  = 0
startTime = time.time()
for i in range (0, numOfDoc):
    data = getRandomData(dGen)
    numOfRetries = 0
    while 1:
        try:
            ret = idx.index(data, a_index, a_type)
            if ret.has_key('ok') and ret['ok'] == True:
                success += 1
            else:
                #print 'Error: ' + str(ret)
                errors += 1
            break
        except:
            #print 'An error has occuerd, retrying....'
            if numOfRetries == numOfRet:
                #print 'Unable to recover after ' + str(numOfRet)
  • ' retries.'
    break
    numOfRetries += 1
    pass

    endTime = time.time()
    totalTime = endTime - startTime
    print 'Generated: ' + str(success) + ' records, errors: ' +
    str(errors) + ', time: ' + time.strftime('%M:%S',
    time.localtime(totalTime))

Thanks
Zaharije


(Andrew Harvey-2) #2

I had similar issues, but it was resolved when I increased ES_MAX_MEM from it's default of 256 to 2048. The default settings are pretty conservative, and if you've got 8GB to play with, you've got some room to move.

Andrew

Sent from my iPad

On 29/06/2010, at 23:50, "Zaharije Pasalic" pasalic.zaharije@gmail.com wrote:

Hi

we are doing single index update via python script (there is sample
code at the end of email) but after few millions of documents we are
getting slowdown and "Long running GC". It seems that we are having
similar problem as
http://elasticsearch-users.115913.n3.nabble.com/lack-of-memory-td762199.html#a762199,
but i did not found any solution in that mail.

we are having 8 node cluster (8GB of ram). Also we are using 24 shards
and 2 replicas. Actual numbers is rough estimate for 1bilion of
documents but test failed far from that number (also question is, is
that large number of shards x replicas too large? and how ES handles
per-shard index - does it need to be whole in memory?).

We used following configuration:

cluster:
name: ES-test

discovery:
type: jgroups
jgroups:
config: tcp
bind_port: 7800
bind_addr: katta
tcpping:
initial_hosts:
katta[7800],k00[7800],k01[7800],k02[7800],k03[7800],k04[7800],k06[7800],k07[7800]

gateway.fs.location: /search-sharing
gateway.type: fs

NFS is used for gateway.

Here is some dump from DEBUG log:

[11:18:39,215][INFO ][cluster.metadata ] [Gaia] Index
[users0]: Update mapping [id120071] (dynamic)
[11:18:52,367][WARN ][monitor.jvm ] [Gaia] Long GC
collection occurred, took [13s], breached threshold
[10s][11:19:04,513][WARN ][monitor.jvm ] [Gaia] Long GC
collection occurred, took [12s], breached threshold [10s]
[11:19:11,453][WARN ][jgroups.FC ] Received two credit
requests from k07-63227 without any intervening messages; sen
ding 1981561 credits[11:19:23,380][WARN ][monitor.jvm ]
[Gaia] Long GC collection occurred, took [18.7s], breached threshold
[10s][11:19:23,380][DEBUG][action.index ] [Gaia]
[users0][4], Node[katta-26574], [P], S[STARTED]: Failed to execute
[[users
0][id660070][4729da78-2392-4c41-9534-957be5ba1984],
source[{"air_class": "coach", "count": "true", "anual_income": 4314,
"hotel":
"starwood", "zipcode": 94365, "sex": "female", "net_worth":
64362}]]java.lang.NullPointerException
at org.elasticsearch.index.mapper.xcontent.XContentNumberFieldMapper$CachedNumericTokenStream.close(XContentNumberFieldMap
per.java:216)
at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:196)
at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:752)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1932)
at org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:191)
at org.elasticsearch.index.shard.service.InternalIndexShard.innerCreate(InternalIndexShard.java:222)
at org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:210)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:127)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:56)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:328)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.access$400(TransportShardReplicationOperationAction.java:198)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(T
ransportShardReplicationOperationAction.java:252)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[11:19:28,071][WARN ][jgroups.FD ] I was suspected by
k04-49735; ignoring the SUSPECT message and sending back a HEA
RTBEAT_ACK[11:19:42,368][WARN ][monitor.jvm ] [Gaia] Long
GC collection occurred, took [18.7s], breached threshold [10s]

And after this there are more and more of the same message.

Here is sample of app which is doing indexing.

################################################################################
def createDocuments(a_index, a_type, a_cfg):
dGen = dataGenerator.dataGenerator()
dGen.init('data/fields.txt')
nodes = getNodeList(a_cfg)
node = dGen.getRandomListMember(nodes)
address = node['address']
port = node['port']
url = address + ':' + str(port)
numOfDoc = int(a_cfg.getNode('create/indexes/numOfDocuments').text)
numOfRet = int(a_cfg.getNode('create/indexes/numOfRetries').text)

idx = ElasticSearch(url)

success = 0
errors = 0
startTime = time.time()
for i in range (0, numOfDoc):
data = getRandomData(dGen)
numOfRetries = 0
while 1:
try:
ret = idx.index(data, a_index, a_type)
if ret.has_key('ok') and ret['ok'] == True:
success += 1
else:
#print 'Error: ' + str(ret)
errors += 1
break
except:
#print 'An error has occuerd, retrying....'
if numOfRetries == numOfRet:
#print 'Unable to recover after ' + str(numOfRet)

  • ' retries.'
    break
    numOfRetries += 1
    pass

    endTime = time.time()
    totalTime = endTime - startTime
    print 'Generated: ' + str(success) + ' records, errors: ' +
    str(errors) + ', time: ' + time.strftime('%M:%S',
    time.localtime(totalTime))

Thanks
Zaharije

Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/

Please consider the environment before printing this email
This email transmission is confidential and intended solely for the person or organisation to whom it is addressed. If you are not the intended recipient, you must not copy, distribute or disseminate the information, or take any action in relation to it and please delete this e-mail. Any views expressed in this message are those of the individual sender, except where the send specifically states them to be the views of any organisation or employer. If you have received this message in error, do not open any attachment but please notify the sender (above). This message has been checked for all known viruses powered by McAfee.

For further information visit http://www.mcafee.com/us/threat_center/default.asp
Please rely on your own virus check as no responsibility is taken by the sender for any damage rising out of any virus infection this communication may contain.

This message has been scanned for malware by Websense. www.websense.com


(Shay Banon) #3

Hi,

First of all, which version are you using? Note that jgroups is no longer
supported, you should move to zen discovery. The configuration is simple,
you remove the jgroups configuration, and set:

discovery:
zen:
ping.unicast.hosts: ["k00[9300]", "k01[9301]"]

The default maximum memory allocated is 1024m, not 256m (where did you see
that its 256, Andrew?). In general, its better to allocate more memory to a
node, especially if its hosting several shards.

How big are your indexed documents? There was a memory leak in Lucene
that was fixed. The fixed Lucene version, along with more
memory enhancements, are at master, so you can try it and see if the problem
persists. Also, how many python clients / threads are you running?

Also, if you want to improve indexing speed, you can set the

index.engine.robin.refresh_interval to a higher value than its default
1s. This controls the interval when index operations will become visible
for searching.

-shay.banon

On Wed, Jun 30, 2010 at 1:05 AM, Andrew Harvey
Andrew.Harvey@lexer.com.auwrote:

I had similar issues, but it was resolved when I increased ES_MAX_MEM from
it's default of 256 to 2048. The default settings are pretty conservative,
and if you've got 8GB to play with, you've got some room to move.

Andrew

Sent from my iPad

On 29/06/2010, at 23:50, "Zaharije Pasalic" pasalic.zaharije@gmail.com
wrote:

Hi

we are doing single index update via python script (there is sample
code at the end of email) but after few millions of documents we are
getting slowdown and "Long running GC". It seems that we are having
similar problem as

http://elasticsearch-users.115913.n3.nabble.com/lack-of-memory-td762199.html#a762199
,

but i did not found any solution in that mail.

we are having 8 node cluster (8GB of ram). Also we are using 24 shards
and 2 replicas. Actual numbers is rough estimate for 1bilion of
documents but test failed far from that number (also question is, is
that large number of shards x replicas too large? and how ES handles
per-shard index - does it need to be whole in memory?).

We used following configuration:

cluster:
name: ES-test

discovery:
type: jgroups
jgroups:
config: tcp
bind_port: 7800
bind_addr: katta
tcpping:
initial_hosts:

katta[7800],k00[7800],k01[7800],k02[7800],k03[7800],k04[7800],k06[7800],k07[7800]

gateway.fs.location: /search-sharing
gateway.type: fs

NFS is used for gateway.

Here is some dump from DEBUG log:

[11:18:39,215][INFO ][cluster.metadata ] [Gaia] Index
[users0]: Update mapping [id120071] (dynamic)
[11:18:52,367][WARN ][monitor.jvm ] [Gaia] Long GC
collection occurred, took [13s], breached threshold
[10s][11:19:04,513][WARN ][monitor.jvm ] [Gaia] Long GC
collection occurred, took [12s], breached threshold [10s]
[11:19:11,453][WARN ][jgroups.FC ] Received two credit
requests from k07-63227 without any intervening messages; sen
ding 1981561 credits[11:19:23,380][WARN ][monitor.jvm ]
[Gaia] Long GC collection occurred, took [18.7s], breached threshold
[10s][11:19:23,380][DEBUG][action.index ] [Gaia]
[users0][4], Node[katta-26574], [P], S[STARTED]: Failed to execute
[[users
0][id660070][4729da78-2392-4c41-9534-957be5ba1984],
source[{"air_class": "coach", "count": "true", "anual_income": 4314,
"hotel":
"starwood", "zipcode": 94365, "sex": "female", "net_worth":
64362}]]java.lang.NullPointerException
at
org.elasticsearch.index.mapper.xcontent.XContentNumberFieldMapper$CachedNumericTokenStream.close(XContentNumberFieldMap
per.java:216)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:196)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:752)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1932)
at
org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:191)
at
org.elasticsearch.index.shard.service.InternalIndexShard.innerCreate(InternalIndexShard.java:222)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:210)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:127)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:56)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:328)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.access$400(TransportShardReplicationOperationAction.java:198)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(T
ransportShardReplicationOperationAction.java:252)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[11:19:28,071][WARN ][jgroups.FD ] I was suspected by
k04-49735; ignoring the SUSPECT message and sending back a HEA
RTBEAT_ACK[11:19:42,368][WARN ][monitor.jvm ] [Gaia] Long
GC collection occurred, took [18.7s], breached threshold [10s]

And after this there are more and more of the same message.

Here is sample of app which is doing indexing.

################################################################################

def createDocuments(a_index, a_type, a_cfg):
dGen = dataGenerator.dataGenerator()
dGen.init('data/fields.txt')
nodes = getNodeList(a_cfg)
node = dGen.getRandomListMember(nodes)
address = node['address']
port = node['port']
url = address + ':' + str(port)
numOfDoc = int(a_cfg.getNode('create/indexes/numOfDocuments').text)
numOfRet = int(a_cfg.getNode('create/indexes/numOfRetries').text)

idx = ElasticSearch(url)

success = 0
errors = 0
startTime = time.time()
for i in range (0, numOfDoc):
data = getRandomData(dGen)
numOfRetries = 0
while 1:
try:
ret = idx.index(data, a_index, a_type)
if ret.has_key('ok') and ret['ok'] == True:
success += 1
else:
#print 'Error: ' + str(ret)
errors += 1
break
except:
#print 'An error has occuerd, retrying....'
if numOfRetries == numOfRet:
#print 'Unable to recover after ' + str(numOfRet)

  • ' retries.'
    break
    numOfRetries += 1
    pass

    endTime = time.time()
    totalTime = endTime - startTime
    print 'Generated: ' + str(success) + ' records, errors: ' +
    str(errors) + ', time: ' + time.strftime('%M:%S',
    time.localtime(totalTime))

Thanks
Zaharije

Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/

Please consider the environment before printing this email
This email transmission is confidential and intended solely for the person
or organisation to whom it is addressed. If you are not the intended
recipient, you must not copy, distribute or disseminate the information, or
take any action in relation to it and please delete this e-mail. Any views
expressed in this message are those of the individual sender, except where
the send specifically states them to be the views of any organisation or
employer. If you have received this message in error, do not open any
attachment but please notify the sender (above). This message has been
checked for all known viruses powered by McAfee.

For further information visit
http://www.mcafee.com/us/threat_center/default.asp
Please rely on your own virus check as no responsibility is taken by the
sender for any damage rising out of any virus infection this communication
may contain.

This message has been scanned for malware by Websense. www.websense.com


(Andrew Harvey-2) #4

On 30/06/2010, at 8:30 AM, Shay Banon wrote:

The default maximum memory allocated is 1024m, not 256m (where did you see that its 256, Andrew?). In general, its better to allocate more memory to a node, especially if its hosting several shards.

You're quite correct. That was an early morning brainfart. :slight_smile:

Andrew
Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/

Please consider the environment before printing this email
This email transmission is confidential and intended solely for the person or organisation to whom it is addressed. If you are not the intended recipient, you must not copy, distribute or disseminate the information, or take any action in relation to it and please delete this e-mail. Any views expressed in this message are those of the individual sender, except where the send specifically states them to be the views of any organisation or employer. If you have received this message in error, do not open any attachment but please notify the sender (above). This message has been checked for all known viruses powered by McAfee.

For further information visit http://www.mcafee.com/us/threat_center/default.asp
Please rely on your own virus check as no responsibility is taken by the sender for any damage rising out of any virus infection this communication may contain.

This message has been scanned for malware by Websense. www.websense.com


(Zaharije) #5

Thx for reply. We will get latest ES and try.

We are using pretty small docs, about 7-10 fields and having only one
process with 20 threads.

Now we are just doing some tests to check if ES is suitable for our
purpose. Real system will not been pre-seeded with data, instead we
will have some number of writes/reads per day (let say around 100K
writes and 1M reads). It is not cruical to us to have super-fast
indexing (it is good to have, but also 1s refresh is more important),
for now we wanna to have large amount of data on ES to start doing
some testing.

Thx for response, we will send you additional info if problem persists.

Best
Zaharije

On Wed, Jun 30, 2010 at 12:31 AM, Andrew Harvey
Andrew.Harvey@lexer.com.au wrote:

On 30/06/2010, at 8:30 AM, Shay Banon wrote:

The default maximum memory allocated is 1024m, not 256m (where did you see that its 256, Andrew?). In general, its better to allocate more memory to a node, especially if its hosting several shards.

You're quite correct. That was an early morning brainfart. :slight_smile:

Andrew
Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/

Please consider the environment before printing this email
This email transmission is confidential and intended solely for the person or organisation to whom it is addressed. If you are not the intended recipient, you must not copy, distribute or disseminate the information, or take any action in relation to it and please delete this e-mail. Any views expressed in this message are those of the individual sender, except where the send specifically states them to be the views of any organisation or employer. If you have received this message in error, do not open any attachment but please notify the sender (above). This message has been checked for all known viruses powered by McAfee.

For further information visit http://www.mcafee.com/us/threat_center/default.asp
Please rely on your own virus check as no responsibility is taken by the sender for any damage rising out of any virus infection this communication may contain.

This message has been scanned for malware by Websense. www.websense.com


(Zaharije) #6

Still after pulling master branch we are encountering some kind of memory leak.

Here is part of the log from master node:

23:29:18,622][WARN ][monitor.jvm ] [katta] Long GC
collection occurred, took [12.6m], breached threshold
[10s][23:29:18,764][WARN ][transport ] [katta]
Transport response handler timed out, action [discovery/zen/fd/ping],
nod [[k03][9f0eb035-c61b-4651-b1a2-c446ac7b4db2][inet[/172.17.12.13:9304]]]ed
to perform scheduled engine refresh[23:29:18,790][WARN
][index.shard.service ] [katta][users0][2] Failed to perform
scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[users0][2] Refresh failed at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:272)
at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run(InternalIndexShard.java:518)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)Caused by:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:733)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:738)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:392)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:374)
at org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:377)
at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:388)
at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:355)
at org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:260)
... 10 more

and at the end

[01:29:23,889][WARN ][transport ] [katta] Transport
response handler not found of id [2149118]
[01:33:46,389][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k02][d1cb43e2-9015-454c-8ff6-02ff357ab6e5][inet[/172.17.12.12:9303]]]
[01:33:56,823][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k04][86a93ecd-e43a-4ca7-ac79-090647af5092][inet[/172.17.12.14:9305]]]
[01:37:54,850][WARN ][timer ] [katta] An exception
was thrown by TimerTask.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,088][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,089][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space

Following configuration is used:

name: katta

cluster:
name: ES-test

discovery:
type: zen
zen:
ping:
unicast:
hosts: [
"k00:9301","k01:9302","k02:9303","k03:9304","k04:9305","k06:9306","k07:9307"
]
transport:
tcp:
port: 9300
connections_per_node: 7
network:
host: katta

gateway:
type: fs
fs:
location: /search-sharing

We changed elasticsearch.in.sh with 2GB of ES_MAX_MEM and barely
manged to index about 7M documents. It seems that only master node
gets out of memory and after that whole cluster is unresponsive.

On Wed, Jun 30, 2010 at 10:45 AM, Zaharije Pasalic
pasalic.zaharije@gmail.com wrote:

Thx for reply. We will get latest ES and try.

We are using pretty small docs, about 7-10 fields and having only one
process with 20 threads.

Now we are just doing some tests to check if ES is suitable for our
purpose. Real system will not been pre-seeded with data, instead we
will have some number of writes/reads per day (let say around 100K
writes and 1M reads). It is not cruical to us to have super-fast
indexing (it is good to have, but also 1s refresh is more important),
for now we wanna to have large amount of data on ES to start doing
some testing.

Thx for response, we will send you additional info if problem persists.

Best
Zaharije

On Wed, Jun 30, 2010 at 12:31 AM, Andrew Harvey
Andrew.Harvey@lexer.com.au wrote:

On 30/06/2010, at 8:30 AM, Shay Banon wrote:

The default maximum memory allocated is 1024m, not 256m (where did you see that its 256, Andrew?). In general, its better to allocate more memory to a node, especially if its hosting several shards.

You're quite correct. That was an early morning brainfart. :slight_smile:

Andrew
Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/

Please consider the environment before printing this email
This email transmission is confidential and intended solely for the person or organisation to whom it is addressed. If you are not the intended recipient, you must not copy, distribute or disseminate the information, or take any action in relation to it and please delete this e-mail. Any views expressed in this message are those of the individual sender, except where the send specifically states them to be the views of any organisation or employer. If you have received this message in error, do not open any attachment but please notify the sender (above). This message has been checked for all known viruses powered by McAfee.

For further information visit http://www.mcafee.com/us/threat_center/default.asp
Please rely on your own virus check as no responsibility is taken by the sender for any damage rising out of any virus infection this communication may contain.

This message has been scanned for malware by Websense. www.websense.com


(Zaharije) #7

Additional info: we created separate Java client which uses Java API
to index data, and it seems that now it's ok (we indexed 24M till
now). Not sure how ES is implemented but maybe rest handling have some
memory issue?

Best
Zaharije

On Thu, Jul 1, 2010 at 10:53 AM, Zaharije Pasalic
pasalic.zaharije@gmail.com wrote:

Still after pulling master branch we are encountering some kind of memory leak.

Here is part of the log from master node:

23:29:18,622][WARN ][monitor.jvm ] [katta] Long GC
collection occurred, took [12.6m], breached threshold
[10s][23:29:18,764][WARN ][transport ] [katta]
Transport response handler timed out, action [discovery/zen/fd/ping],
nod [[k03][9f0eb035-c61b-4651-b1a2-c446ac7b4db2][inet[/172.17.12.13:9304]]]ed
to perform scheduled engine refresh[23:29:18,790][WARN
][index.shard.service ] [katta][users0][2] Failed to perform
scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[users0][2] Refresh failed at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:272)
at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run(InternalIndexShard.java:518)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)Caused by:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:733)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:738)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:392)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:374)
at org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:377)
at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:388)
at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:355)
at org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:260)
... 10 more

and at the end

[01:29:23,889][WARN ][transport ] [katta] Transport
response handler not found of id [2149118]
[01:33:46,389][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k02][d1cb43e2-9015-454c-8ff6-02ff357ab6e5][inet[/172.17.12.12:9303]]]
[01:33:56,823][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k04][86a93ecd-e43a-4ca7-ac79-090647af5092][inet[/172.17.12.14:9305]]]
[01:37:54,850][WARN ][timer ] [katta] An exception
was thrown by TimerTask.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,088][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,089][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space

Following configuration is used:

name: katta

cluster:
name: ES-test

discovery:
type: zen
zen:
ping:
unicast:
hosts: [
"k00:9301","k01:9302","k02:9303","k03:9304","k04:9305","k06:9306","k07:9307"
]
transport:
tcp:
port: 9300
connections_per_node: 7
network:
host: katta

gateway:
type: fs
fs:
location: /search-sharing

We changed elasticsearch.in.sh with 2GB of ES_MAX_MEM and barely
manged to index about 7M documents. It seems that only master node
gets out of memory and after that whole cluster is unresponsive.

On Wed, Jun 30, 2010 at 10:45 AM, Zaharije Pasalic
pasalic.zaharije@gmail.com wrote:

Thx for reply. We will get latest ES and try.

We are using pretty small docs, about 7-10 fields and having only one
process with 20 threads.

Now we are just doing some tests to check if ES is suitable for our
purpose. Real system will not been pre-seeded with data, instead we
will have some number of writes/reads per day (let say around 100K
writes and 1M reads). It is not cruical to us to have super-fast
indexing (it is good to have, but also 1s refresh is more important),
for now we wanna to have large amount of data on ES to start doing
some testing.

Thx for response, we will send you additional info if problem persists.

Best
Zaharije

On Wed, Jun 30, 2010 at 12:31 AM, Andrew Harvey
Andrew.Harvey@lexer.com.au wrote:

On 30/06/2010, at 8:30 AM, Shay Banon wrote:

The default maximum memory allocated is 1024m, not 256m (where did you see that its 256, Andrew?). In general, its better to allocate more memory to a node, especially if its hosting several shards.

You're quite correct. That was an early morning brainfart. :slight_smile:

Andrew
Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/

Please consider the environment before printing this email
This email transmission is confidential and intended solely for the person or organisation to whom it is addressed. If you are not the intended recipient, you must not copy, distribute or disseminate the information, or take any action in relation to it and please delete this e-mail. Any views expressed in this message are those of the individual sender, except where the send specifically states them to be the views of any organisation or employer. If you have received this message in error, do not open any attachment but please notify the sender (above). This message has been checked for all known viruses powered by McAfee.

For further information visit http://www.mcafee.com/us/threat_center/default.asp
Please rely on your own virus check as no responsibility is taken by the sender for any damage rising out of any virus infection this communication may contain.

This message has been scanned for malware by Websense. www.websense.com


(Shay Banon) #8

REST should not have memory issues, it is used by other
clients successfully. Of course, you can overload it a lot, for example, by
not using keep alive. Thats why I asked how you were using it from the
python side, how many clients you are running, and so on.

-shay.banon

On Thu, Jul 1, 2010 at 3:48 PM, Zaharije Pasalic <pasalic.zaharije@gmail.com

wrote:

Additional info: we created separate Java client which uses Java API
to index data, and it seems that now it's ok (we indexed 24M till
now). Not sure how ES is implemented but maybe rest handling have some
memory issue?

Best
Zaharije

On Thu, Jul 1, 2010 at 10:53 AM, Zaharije Pasalic
pasalic.zaharije@gmail.com wrote:

Still after pulling master branch we are encountering some kind of memory
leak.

Here is part of the log from master node:

23:29:18,622][WARN ][monitor.jvm ] [katta] Long GC
collection occurred, took [12.6m], breached threshold
[10s][23:29:18,764][WARN ][transport ] [katta]
Transport response handler timed out, action [discovery/zen/fd/ping],
nod [[k03][9f0eb035-c61b-4651-b1a2-c446ac7b4db2][inet[/172.17.12.13:9304
]]]ed
to perform scheduled engine refresh[23:29:18,790][WARN
][index.shard.service ] [katta][users0][2] Failed to perform
scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[users0][2] Refresh failed at

org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:272)

   at

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run(InternalIndexShard.java:518)

   at

java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

   at

java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

   at

java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

   at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

   at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)

   at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:619)Caused by:

org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:733)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:738)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:392)
at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:374)
at
org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:377)
at
org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:388)
at
org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:355)
at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:260)
... 10 more

and at the end

[01:29:23,889][WARN ][transport ] [katta] Transport
response handler not found of id [2149118]
[01:33:46,389][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k02][d1cb43e2-9015-454c-8ff6-02ff357ab6e5][inet[/172.17.12.12:9303]]]
[01:33:56,823][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k04][86a93ecd-e43a-4ca7-ac79-090647af5092][inet[/172.17.12.14:9305]]]
[01:37:54,850][WARN ][timer ] [katta] An exception
was thrown by TimerTask.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,088][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,089][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space

Following configuration is used:

name: katta

cluster:
name: ES-test

discovery:
type: zen
zen:
ping:
unicast:
hosts: [

"k00:9301","k01:9302","k02:9303","k03:9304","k04:9305","k06:9306","k07:9307"

]
transport:
tcp:
port: 9300
connections_per_node: 7
network:
host: katta

gateway:
type: fs
fs:
location: /search-sharing

We changed elasticsearch.in.sh with 2GB of ES_MAX_MEM and barely
manged to index about 7M documents. It seems that only master node
gets out of memory and after that whole cluster is unresponsive.

On Wed, Jun 30, 2010 at 10:45 AM, Zaharije Pasalic
pasalic.zaharije@gmail.com wrote:

Thx for reply. We will get latest ES and try.

We are using pretty small docs, about 7-10 fields and having only one
process with 20 threads.

Now we are just doing some tests to check if ES is suitable for our
purpose. Real system will not been pre-seeded with data, instead we
will have some number of writes/reads per day (let say around 100K
writes and 1M reads). It is not cruical to us to have super-fast
indexing (it is good to have, but also 1s refresh is more important),
for now we wanna to have large amount of data on ES to start doing
some testing.

Thx for response, we will send you additional info if problem persists.

Best
Zaharije

On Wed, Jun 30, 2010 at 12:31 AM, Andrew Harvey
Andrew.Harvey@lexer.com.au wrote:

On 30/06/2010, at 8:30 AM, Shay Banon wrote:

The default maximum memory allocated is 1024m, not 256m (where did you
see that its 256, Andrew?). In general, its better to allocate more memory
to a node, especially if its hosting several shards.

You're quite correct. That was an early morning brainfart. :slight_smile:

Andrew
Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/


Please consider the environment before printing this email
This email transmission is confidential and intended solely for the
person or organisation to whom it is addressed. If you are not the intended
recipient, you must not copy, distribute or disseminate the information, or
take any action in relation to it and please delete this e-mail. Any views
expressed in this message are those of the individual sender, except where
the send specifically states them to be the views of any organisation or
employer. If you have received this message in error, do not open any
attachment but please notify the sender (above). This message has been
checked for all known viruses powered by McAfee.

For further information visit
http://www.mcafee.com/us/threat_center/default.asp

Please rely on your own virus check as no responsibility is taken by
the sender for any damage rising out of any virus infection this
communication may contain.

This message has been scanned for malware by Websense.
www.websense.com


(Robert Eanes) #9

If you are using pyelasticsearch try the latest master on github and see if that helps. I just changed it to re-use the http connection across requests.

On Jul 2, 2010, at 8:58 AM, Shay Banon wrote:

REST should not have memory issues, it is used by other clients successfully. Of course, you can overload it a lot, for example, by not using keep alive. Thats why I asked how you were using it from the python side, how many clients you are running, and so on.

-shay.banon

On Thu, Jul 1, 2010 at 3:48 PM, Zaharije Pasalic pasalic.zaharije@gmail.com wrote:
Additional info: we created separate Java client which uses Java API
to index data, and it seems that now it's ok (we indexed 24M till
now). Not sure how ES is implemented but maybe rest handling have some
memory issue?

Best
Zaharije

On Thu, Jul 1, 2010 at 10:53 AM, Zaharije Pasalic
pasalic.zaharije@gmail.com wrote:

Still after pulling master branch we are encountering some kind of memory leak.

Here is part of the log from master node:

23:29:18,622][WARN ][monitor.jvm ] [katta] Long GC
collection occurred, took [12.6m], breached threshold
[10s][23:29:18,764][WARN ][transport ] [katta]
Transport response handler timed out, action [discovery/zen/fd/ping],
nod [[k03][9f0eb035-c61b-4651-b1a2-c446ac7b4db2][inet[/172.17.12.13:9304]]]ed
to perform scheduled engine refresh[23:29:18,790][WARN
][index.shard.service ] [katta][users0][2] Failed to perform
scheduled engine refresh
org.elasticsearch.index.engine.RefreshFailedEngineException:
[users0][2] Refresh failed at
org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:272)
at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run(InternalIndexShard.java:518)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)Caused by:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:733)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:738)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:392)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:374)
at org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:377)
at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:388)
at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:355)
at org.elasticsearch.index.engine.robin.RobinEngine.refresh(RobinEngine.java:260)
... 10 more

and at the end

[01:29:23,889][WARN ][transport ] [katta] Transport
response handler not found of id [2149118]
[01:33:46,389][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k02][d1cb43e2-9015-454c-8ff6-02ff357ab6e5][inet[/172.17.12.12:9303]]]
[01:33:56,823][WARN ][transport ] [katta] Transport
response handler timed out, action [discovery/zen/fd/ping], node
[[k04][86a93ecd-e43a-4ca7-ac79-090647af5092][inet[/172.17.12.14:9305]]]
[01:37:54,850][WARN ][timer ] [katta] An exception
was thrown by TimerTask.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,088][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[02:04:41,089][WARN ][netty.channel.socket.nio.NioWorker] Unexpected
exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space

Following configuration is used:

name: katta

cluster:
name: ES-test

discovery:
type: zen
zen:
ping:
unicast:
hosts: [
"k00:9301","k01:9302","k02:9303","k03:9304","k04:9305","k06:9306","k07:9307"
]
transport:
tcp:
port: 9300
connections_per_node: 7
network:
host: katta

gateway:
type: fs
fs:
location: /search-sharing

We changed elasticsearch.in.sh with 2GB of ES_MAX_MEM and barely
manged to index about 7M documents. It seems that only master node
gets out of memory and after that whole cluster is unresponsive.

On Wed, Jun 30, 2010 at 10:45 AM, Zaharije Pasalic
pasalic.zaharije@gmail.com wrote:

Thx for reply. We will get latest ES and try.

We are using pretty small docs, about 7-10 fields and having only one
process with 20 threads.

Now we are just doing some tests to check if ES is suitable for our
purpose. Real system will not been pre-seeded with data, instead we
will have some number of writes/reads per day (let say around 100K
writes and 1M reads). It is not cruical to us to have super-fast
indexing (it is good to have, but also 1s refresh is more important),
for now we wanna to have large amount of data on ES to start doing
some testing.

Thx for response, we will send you additional info if problem persists.

Best
Zaharije

On Wed, Jun 30, 2010 at 12:31 AM, Andrew Harvey
Andrew.Harvey@lexer.com.au wrote:

On 30/06/2010, at 8:30 AM, Shay Banon wrote:

The default maximum memory allocated is 1024m, not 256m (where did you see that its 256, Andrew?). In general, its better to allocate more memory to a node, especially if its hosting several shards.

You're quite correct. That was an early morning brainfart. :slight_smile:

Andrew
Andrew Harvey / Developer
lexer
m/
t/ +61 2 9019 6379
w/ http://lexer.com.au
Help put an end to whaling. Visit http://www.givewhalesavoice.com.au/

Please consider the environment before printing this email
This email transmission is confidential and intended solely for the person or organisation to whom it is addressed. If you are not the intended recipient, you must not copy, distribute or disseminate the information, or take any action in relation to it and please delete this e-mail. Any views expressed in this message are those of the individual sender, except where the send specifically states them to be the views of any organisation or employer. If you have received this message in error, do not open any attachment but please notify the sender (above). This message has been checked for all known viruses powered by McAfee.

For further information visit http://www.mcafee.com/us/threat_center/default.asp
Please rely on your own virus check as no responsibility is taken by the sender for any damage rising out of any virus infection this communication may contain.

This message has been scanned for malware by Websense. www.websense.com


(Enver) #10

I've performed several tests by adding few million documents to single
index using ES version 0.9.0. For index store I mainly used FS based
storage.
As script generates documents memory usage constantly raises.
Typically, when test finishes, 6GB of memory is used. When I shut down
elastic search process (shutdown, not kill -9... ), memory usage falls
to 4GB. After removing files under elasticsearch/work/
memory usage drops to ~300MB.
It seems that cleanup in not performed correctly

Could you please explain some internals of non JVM heap memory storage
(How it works? Where is this data actually stored, because I can't see
this memory accounted to any running process)?

Regards

 Enver

(Clinton Gormley) #11

As script generates documents memory usage constantly raises.
Typically, when test finishes, 6GB of memory is used. When I shut down
elastic search process (shutdown, not kill -9... ), memory usage falls
to 4GB. After removing files under elasticsearch/work/
memory usage drops to ~300MB.
It seems that cleanup in not performed correctly

This sounds like your operating system caching the files in memory for
performance reasons, and is normal.

If this memory is required by other processes, the kernel would reduce
the amount of memory used for file caches, but this would also make
future reads/writes to these files less performant.

If you're using unix or linux, use the 'top' command to see your memory
usage. If the "Cached" value is high and accounting for memory usage,
then this is the issue (although, it's not an actual issue - it is by
design)

maybe i'm way off the mark and have misunderstood your description of
the problem

clint


(Enver) #12

On Jul 7, 2:23 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

As script generates documents memory usage constantly raises.
Typically, when test finishes, 6GB of memory is used. When I shut down
elastic search process (shutdown, not kill -9... ), memory usage falls
to 4GB. After removing files under elasticsearch/work/
memory usage drops to ~300MB.
It seems that cleanup in not performed correctly

This sounds like your operating system caching the files in memory for
performance reasons, and is normal.

If this memory is required by other processes, the kernel would reduce
the amount of memory used for file caches, but this would also make
future reads/writes to these files less performant.

If you're using unix or linux, use the 'top' command to see your memory
usage. If the "Cached" value is high and accounting for memory usage,
then this is the issue (although, it's not an actual issue - it is by
design)

maybe i'm way off the mark and have misunderstood your description of
the problem

clint

I've seen same behavior when user process does not perform correct
cleanup of shared memory. After process exit memory stays occupied
until shared memory object is deleted.

It will be good to have this index store module explained.


(Shay Banon) #13

Which index store are you using? If you are using the fs index, its basic
file handling in Java, no explicit native memory management done in
elasticsearch itself. If you use using memory caching in elasticsearch with
FS, or using memory store module, then by default it allocates native
memory, but it does not relate to files stored on disk, so the fact that the
memory goes away when you delete it make me suspect file caching done by the
os as well...

-shay.banon

On Wed, Jul 7, 2010 at 4:50 PM, Enver enver.cicak@gmail.com wrote:

On Jul 7, 2:23 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

As script generates documents memory usage constantly raises.
Typically, when test finishes, 6GB of memory is used. When I shut down
elastic search process (shutdown, not kill -9... ), memory usage falls
to 4GB. After removing files under elasticsearch/work/
memory usage drops to ~300MB.
It seems that cleanup in not performed correctly

This sounds like your operating system caching the files in memory for
performance reasons, and is normal.

If this memory is required by other processes, the kernel would reduce
the amount of memory used for file caches, but this would also make
future reads/writes to these files less performant.

If you're using unix or linux, use the 'top' command to see your memory
usage. If the "Cached" value is high and accounting for memory usage,
then this is the issue (although, it's not an actual issue - it is by
design)

maybe i'm way off the mark and have misunderstood your description of
the problem

clint

I've seen same behavior when user process does not perform correct
cleanup of shared memory. After process exit memory stays occupied
until shared memory object is deleted.

It will be good to have this index store module explained.


(system) #14