Re: Digest for elasticsearch@googlegroups.com - 11 Messages in 6 Topics

On Mon, Apr 15, 2013 at 9:34 AM, elasticsearch@googlegroups.com wrote:

Today's Topic Summary

Group: http://groups.google.com/group/elasticsearch/topics

  • Query using lang-javascript hangs under Ubuntu 12.04<#13e0bdd5a10ea4f8_group_thread_0>[1 Update]
  • Failing starting due to transport layer exception<#13e0bdd5a10ea4f8_group_thread_1>[1 Update]
  • More Like This API <#13e0bdd5a10ea4f8_group_thread_2> [1 Update]
  • I would like to be able to search parenthsis<#13e0bdd5a10ea4f8_group_thread_3>[6 Updates]
  • terms facet calculation path (random vs sequential disk access)<#13e0bdd5a10ea4f8_group_thread_4>[1 Update]
  • OutOfMemory exception after few hours of indexing<#13e0bdd5a10ea4f8_group_thread_5>[1 Update]

Query using lang-javascript hangs under Ubuntu 12.04http://groups.google.com/group/elasticsearch/t/723a24c4dadc3ab3

troy@scriptedmotion.com Apr 14 08:21PM -0700

I installed lang-javascript and I see this in my log:

[2013-04-14 19:15:19,426][INFO ][plugins ] [Moondark]
loaded [lang-javascript], sites []

When I post the following to the _search api elasticsearch does not
respond
and the http request times out:

{
"query":{
"filtered":{
"query":{
"match_all":{}
},
"filter":{
"missing":{
"field":"deletedAt",
"existence":true,
"null_value":true
}
}
}
},
"sort":[{
"_script":{
"script":"doc['usernamesAssigned'].values.sort().join()",
"type":"string",
"lang":"js",
"order":"asc"
}
}],
"size":100
}

This same query works fine on my development machine (mac os x). Is
there a
dependency Ubuntu doesn't have or something?

Failing starting due to transport layer exceptionhttp://groups.google.com/group/elasticsearch/t/cb07c508285397f3

larmbr zhan nasa4836@gmail.com Apr 15 11:15AM +0800

Version : 0.20.5

I have installed ES as usual, everything works well except
that it issue an exception during starting, but it still could
work.

log snippt :

exception caught on transport layer [[id: 0x6d33e262,
/192.168.2.183:58596 :> /192.168.13.89:9300]], closing connection
java.io.StreamCorruptedException: invalid internal transport message
format
at
org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:27)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:482)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at
org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:396)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:336)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:81)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:570)
at
org.elasticsearch.common.netty.channel.Channels.close(Channels.java:812)
at
org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
at
org.elasticsearch.transport.netty.NettyTransport.exceptionCaught(NettyTransport.java:500)
at
org.elasticsearch.transport.netty.MessageChannelHandler.exceptionCaught(MessageChannelHandler.java:227)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at
org.elasticsearch.common.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at
org.elasticsearch.common.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:48)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:654)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:562)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

It just issues this exception sevral times, but it still starts
successfully eventually.

I have googled it, found this may be caused by the version mismatch of
ES and logstash, but I don't use logstash.

--

Regards

More Like This APIhttp://groups.google.com/group/elasticsearch/t/aa748cb9e6e27277

Marlo Epres mepres@gmail.com Apr 14 07:26PM -0700

I understand the MLT API to just be a combination of the Get API and
MLT
query. The fields specified in 'mlt_fields' are used for both text
extraction to input into 'like_text' in the MLT query and also what
fields
to compare against in the MLT query. My question is, is this correct
and if
so, is all text placed in like_text (even if fields specified result
in
hundreds of words)?

I would like to be able to search parenthsishttp://groups.google.com/group/elasticsearch/t/3c1f9e103fa08533

Andy Bajka andybajka2012@gmail.com Apr 14 05:59PM -0700

Looking at the Xenforo code, I need to replicate this mapping.

public static $optimizedGenericMapping = array(
"_source" => array("enabled" => false),
"properties" => array(
"title" => array("type" => "string"),
"message" => array("type" => "string"),
"date" => array("type" => "long", "store" => "yes"),
"user" => array("type" => "long", "store" => "yes"),
"discussion_id" => array("type" => "long", "store" => "yes")
)
);

Andy Bajka andybajka2012@gmail.com Apr 14 06:12PM -0700

I've taken a stab at creating my own analyzer mapping:

"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"tweet_filter" : {
"type" : "word_delimiter",
"type_table": ["( => ALPHA", ") => ALPHA"]
}
},
"analyzer" : {
"tweet_analyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "tweet_filter"]
}
}
}
},
"mappings" : {
"source" : {"enabled" : "false"},
"properties" : {
"title" : {"type" : "string"},
"message" : {"type" : "string"},
"date" : {"type" : "long", "store" : "yes"},
"user" : {"type" : "long", "store" : "yes"},
"discussion_id" : {"type" : "long", "store" : "yes"}
}
}
}

Here is the _mapping which is not correct.

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"source" : {
"enabled" : false,
"properties" : { }
},
"properties" : {
"properties" : { }
}
}
}

Andy Bajka andybajka2012@gmail.com Apr 14 06:14PM -0700

Also it said I could not use the underscore in _source so I changed it
to
source.

Andy Bajka andybajka2012@gmail.com Apr 14 06:50PM -0700

I'm making progress. It's still not like the mapping of the Xenforo
ElasticSearch, but getting closer:

{
"twitter" : {
"tweet" : {
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}

Andy Bajka andybajka2012@gmail.com Apr 14 06:53PM -0700

This is a good sign, the filter works.

curl -XGET 'localhost:9200/twitter/_analyze?field=message&pretty=1' -d
'(andy)'
{
"tokens" : [ {
"token" : "(andy)",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}

Andy Bajka andybajka2012@gmail.com Apr 14 06:59PM -0700

I think I got it!!

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}
}

terms facet calculation path (random vs sequential disk access)http://groups.google.com/group/elasticsearch/t/3168ce8d3f773ac7

Ivan Brusic ivan@brusic.com Apr 14 05:53PM -0700

Hopefully someone for the elasticsearch team or anyone else that really
knows the internals can respond, but in the meantime here is a general
overview.

ElasticSearch is essentially a distributed version of Lucene.
Ultimately
all the queried content resides in the indices and segments managed by
Lucene. Index data is stored in off-heap memory, which is memory not
allocated to the JVM. This allocation of memory is why it is suggested
to
ran the JVM with 50% of the system memory, since Lucene uses the rest.
You
can tune Lucene's memory management by changing the underlying
filesystem
implementation. [1] This memory is managed by the operating system, so
data
outside the size of the system will be paged to disk. If you want to
learn
more, look into Lucene.

The field data (used for facets and sorting) is allocated in the JVM's
heap. ElasticSearch uses Google Guava [2] to manage its caches. The
settings are tuneable and basically adjust the Google Guava settings.
Too
large of field data will result in large garbage collections by the
JVM.
Cache entry can expire depending on the settings used and the amount of
data in the cache.

The last piece of the puzzle is the transaction log. Documents indexed
in
ElasticSearch are first placed into the transaction log. If the write
consistency is achieved, then the documents are written to the Lucene
segments. The Lucene segments are not distributed between shard/nodes,
the
transaction log is. Lucene queries work from the indices, but Get
requests
in ElasticSearch will use the transaction log for real-time queries.
It is
the transaction log that is perhaps least documented and discussed.
There
are no major configuration settings (AFAIK), so not much is exposed.
You
can trace the transaction log code (probably starting from a Get
request)
to learn more.

Essentially ElasticSearch loads as much as possible into memory until
it
can't. Lucene data is managed by the OS, field cache by the JVM. I
probably
have already written too much, but have also written too little.

Cheers,

Ivan

[1] http://www.elasticsearch.org/guide/reference/index-modules/store/
[2] http://code.google.com/p/guava-libraries/wiki/CachesExplained

OutOfMemory exception after few hours of indexinghttp://groups.google.com/group/elasticsearch/t/2f22e0aacb3596df

Amit Bh amitbar@gmail.com Apr 14 12:28AM -0700

Hi,

I'm still getting the same error after changed the GC to GC1:

[2013-04-11 21:39:35,017][WARN ][org.elasticsearch.index.engine.robin]
[ElserNJ02] [ad][0] failed engine
java.lang.OutOfMemoryError: GC overhead limit exceeded
at

org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.(FreqProxTermsWriterPerField.java:193)
at

org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204)
at

org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
at

org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:157)
at
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:460)
at

org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:189)
at

org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
at

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
at

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:577)
at

org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:489)
at

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:330)
at

org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:159)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-04-11 21:39:35,547][WARN
][org.elasticsearch.index.merge.scheduler]
[ElserNJ02] [ad][0] failed to merge
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.(Unknown Source)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at
org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
at
org.apache.lucene.index.SegmentMergeInfo.next(SegmentMergeInfo.java:73)
at

org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:501)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:428)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4263)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3908)
at

org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at

org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:91)
at

org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)

The ES_HEAP_SIZE variable is configured to 10g.
I if decrease the Java heap size I'm getting heap size OutOfMomory
error
and if I increase the heap size I'm getting errors from the GC.

Here is the elasticsearch.bin file with the new GC1 configuration:

set JAVA_OPTS=%JAVA_OPTS% -Xss512k

REM Enable aggressive optimizations in the JVM
REM - Disabled by default as it might cause the JVM to crash
REM set JAVA_OPTS=%JAVA_OPTS% -XX:+AggressiveOpts

rem set JAVA_OPTS=%JAVA_OPTS% -XX:+UseParNewGC
rem set JAVA_OPTS=%JAVA_OPTS% -XX:+UseConcMarkSweepGC

set JAVA_OPTS=%JAVA_OPTS% -XX:CMSInitiatingOccupancyFraction=75
set JAVA_OPTS=%JAVA_OPTS% -XX:+UseCMSInitiatingOccupancyOnly

REM When running under Java 7
set JAVA_OPTS=%JAVA_OPTS% -XX:+UseCondCardMark
set JAVA_OPTS=%JAVA_OPTS% -XX:+UnlockExperimentalVMOptions
set JAVA_OPTS=%JAVA_OPTS% -XX:+UseG1GC
set JAVA_OPTS=%JAVA_OPTS% -XX:MaxGCPauseMillis=50
set JAVA_OPTS=%JAVA_OPTS% -XX:GCPauseIntervalMillis=100

REM GC logging options -- uncomment to enable
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintGCDetails
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintGCTimeStamps
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintClassHistogram
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintTenuringDistribution
set JAVA_OPTS=%JAVA_OPTS% -XX:+PrintGCApplicationStoppedTime
set JAVA_OPTS=%JAVA_OPTS% -Xloggc:/var/log/elasticsearch/gc.log

REM Causes the JVM to dump its heap on OutOfMemory.
set JAVA_OPTS=%JAVA_OPTS% -XX:+HeapDumpOnOutOfMemoryError
REM The path to the heap dump location, note directory must exists and
have
enough
REM space for a full heap dump.
set JAVA_OPTS=%JAVA_OPTS% -XX:HeapDumpPath=$ES_HOME/logs/heapdump.hprof

What I'm doing wrong?
Thanks!

You received this message because you are subscribed to the Google Group
elasticsearch.
You can post via email elasticsearch@googlegroups.com.
To unsubscribe from this group, sendelasticsearch+unsubscribe@googlegroups.coman empty message.
For more options, visithttp://groups.google.com/group/elasticsearch/topicsthis group.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
*

VISHWAS M*
*My Accounts: facebookhttp://www.facebook.com/home.php?#!/profile.php?id=1207961090
Orkut http://www.orkut.co.in/Main#Profile?rl=mp&uid=5262527636165830356
linkedinhttp://www.linkedin.com/profile/edit?id=48909209&trk=hb_tab_pro_top
*

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.