Bulk UDP API


(Bart Vandewoestyne) #1

I'm trying to index data using the bulk UDP API on a single node
Elasticsearch 1.3.2. In my elasticsearch config I have

bulk.udp.enabled: true

My bulk file has 85000 documents and has the following characteristics:

bart@hp-g7-02:~/git/data$ ls -al mydata.json
-rw-rw-r-- 1 bart bart 97818287 Aug 28 15:43 mydata.json

bart@hp-g7-02:~/git/data$ wc -l mydata.json
170001 mydata.json

bart@hp-g7-02:~/git/data$ file mydata.json
mydata.json: UTF-8 Unicode English text, with very long lines

Indexing the data using the bulk API described at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
works. I see the documents in my elasticsearch store once the bulk upload
is finished.

However, if I use the same bulk file and try to index it using the command

cat mydata.json | nc -w 0 -u localhost 9700

then only 1 document gets indexed, and I see lots of parsing errors like
the following in my log files:

[2014-08-29 11:28:41,649][WARN ][bulk.udp ] [Mysterio]
failed to execute bulk request
org.elasticsearch.common.jackson.core.JsonParseException: Unrecognized
token '_index': was expecting ('true', 'false' or 'null')
at [Source: [B@656f95ce; line: 1, column: 15]
at org.elasticsearch.common.jackson.core.JsonParser._constructError(
JsonParser.java:1419)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase.
_reportError(ParserMinimalBase.java:508)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
_reportInvalidToken(UTF8StreamJsonParser.java:3201)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
_handleUnexpectedValue(UTF8StreamJsonParser.java:2360)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
_nextTokenNotInObject(UTF8StreamJsonParser.java:794)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser.
nextToken(UTF8StreamJsonParser.java:690)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.
nextToken(JsonXContentParser.java:50)
at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:
266)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
java:256)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
java:252)
at org.elasticsearch.bulk.udp.BulkUdpService$Handler.messageReceived
(BulkUdpService.java:181)
at org.elasticsearch.common.netty.channel.
SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.
java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.
fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.socket.nio.
NioDatagramWorker.read(NioDatagramWorker.java:98)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.
NioDatagramWorker.run(NioDatagramWorker.java:343)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.
DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I find it strange that things work using the usual bulk API, but not with
the bulk UDP API.

Am I overlooking something or doing something wrong?

Thanks,
Bart

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

Maybe it is the line feeds in mydata.json, probably you are not using UNIX
LFs with single \n ?

Jörg

On Fri, Aug 29, 2014 at 11:36 AM, Bart Vandewoestyne <
bart.vandewoestyne@gmail.com> wrote:

I'm trying to index data using the bulk UDP API on a single node
Elasticsearch 1.3.2. In my elasticsearch config I have

bulk.udp.enabled: true

My bulk file has 85000 documents and has the following characteristics:

bart@hp-g7-02:~/git/data$ ls -al mydata.json
-rw-rw-r-- 1 bart bart 97818287 Aug 28 15:43 mydata.json

bart@hp-g7-02:~/git/data$ wc -l mydata.json
170001 mydata.json

bart@hp-g7-02:~/git/data$ file mydata.json
mydata.json: UTF-8 Unicode English text, with very long lines

Indexing the data using the bulk API described at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
works. I see the documents in my elasticsearch store once the bulk upload
is finished.

However, if I use the same bulk file and try to index it using the command

cat mydata.json | nc -w 0 -u localhost 9700

then only 1 document gets indexed, and I see lots of parsing errors like
the following in my log files:

[2014-08-29 11:28:41,649][WARN ][bulk.udp ] [Mysterio]
failed to execute bulk request
org.elasticsearch.common.jackson.core.JsonParseException: Unrecognized
token '_index': was expecting ('true', 'false' or 'null')
at [Source: [B@656f95ce; line: 1, column: 15]
at org.elasticsearch.common.jackson.core.JsonParser.
_constructError(JsonParser.java:1419)
at org.elasticsearch.common.jackson.core.base.ParserMinimalBase.
_reportError(ParserMinimalBase.java:508)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
._reportInvalidToken(UTF8StreamJsonParser.java:3201)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
._handleUnexpectedValue(UTF8StreamJsonParser.java:2360)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
._nextTokenNotInObject(UTF8StreamJsonParser.java:794)
at org.elasticsearch.common.jackson.core.json.UTF8StreamJsonParser
.nextToken(UTF8StreamJsonParser.java:690)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.
nextToken(JsonXContentParser.java:50)
at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:
266)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
java:256)
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.
java:252)
at org.elasticsearch.bulk.udp.BulkUdpService$Handler.
messageReceived(BulkUdpService.java:181)
at org.elasticsearch.common.netty.channel.
SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.
java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.
sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.
fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.socket.nio.
NioDatagramWorker.read(NioDatagramWorker.java:98)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.
NioDatagramWorker.run(NioDatagramWorker.java:343)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.
DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I find it strange that things work using the usual bulk API, but not with
the bulk UDP API.

Am I overlooking something or doing something wrong?

Thanks,
Bart

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6a676c4f-afd1-48a1-ab40-8c258aa3c54e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGEwoT7%3Di%2BkMrTWg%2BXaeELNrWJEhdvU0h9JZAd9ocANeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Bart Vandewoestyne) #3

On Friday, August 29, 2014 2:54:27 PM UTC+2, Jörg Prante wrote:

Maybe it is the line feeds in mydata.json, probably you are not using UNIX
LFs with single \n ?

Jörg

I'm quite sure my data file has only UNIX LFs with a single \n:

$ egrep $'\r'$ mydata.json
$

and also, applying fromdos does not change the file:

$ md5sum mydata.json
bfd17fdea28da79152965455b594b6fe mydata.json
$ fromdos mydata.json
$ md5sum mydata.json
bfd17fdea28da79152965455b594b6fe mydata.json

So this is probably not the cause of the problem...

Regards,
Bart

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/67545c8b-309d-422f-9eea-ea8141b18aac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4