What to do with ES?

Well, after having ES installed and set up 2 nodes on one machine for
testing I tried to get some content into ES.
The results are very bad. After 2 days not a single useful doc with an
acceptable mapping.
Best I could get was:
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException:
transport content length received [1.2gb] exceeded [914.1mb]
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:143)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:103)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:563)
...

I don't know how ES comes to 1.2gb or even 914.1mb, because I only tried to
load about 68mb (112 files) with fsriver.
My current idea is that ES devides my 68mb by its version number 0.198 but
this would make 343mb. (You see I havent't lost my humor...)
Is there ANY useful example somewhere, which really explains step by step
how to start with index, mapping, loading, ...

Bernd

Hum. As the filesystem river author, I feel guilty... I mean that there is probably something wrong with the river.
I will track it down. Could you open an issue for that in fsriver project?

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 2 août 2012 à 15:10, Bernd Fehling bernd.fehling@googlemail.com a écrit :

Well, after having ES installed and set up 2 nodes on one machine for testing I tried to get some content into ES.
The results are very bad. After 2 days not a single useful doc with an acceptable mapping.
Best I could get was:
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.2gb] exceeded [914.1mb]
at org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:143)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:103)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:563)
...

I don't know how ES comes to 1.2gb or even 914.1mb, because I only tried to load about 68mb (112 files) with fsriver.
My current idea is that ES devides my 68mb by its version number 0.198 but this would make 343mb. (You see I havent't lost my humor...)
Is there ANY useful example somewhere, which really explains step by step how to start with index, mapping, loading, ...

Bernd

Hi David,

can do that but are you sure it's the fsriver?

Bernd

Am Donnerstag, 2. August 2012 15:15:45 UTC+2 schrieb David Pilato:

Hum. As the filesystem river author, I feel guilty... I mean that there is
probably something wrong with the river.
I will track it down. Could you open an issue for that in fsriver project?

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Almost. I mean that fsriver is using a bulk to commit changes in ES.
You can start to modify bulk_size to 10 to see if it helps.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 2 août 2012 à 15:30, Bernd Fehling bernd.fehling@googlemail.com a écrit :

Hi David,

can do that but are you sure it's the fsriver?

Bernd

Am Donnerstag, 2. August 2012 15:15:45 UTC+2 schrieb David Pilato:
Hum. As the filesystem river author, I feel guilty... I mean that there is probably something wrong with the river.
I will track it down. Could you open an issue for that in fsriver project?

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Just tried with bulk_size of 10, no changes.
May be something else with my data.
Have the JSON records to be serialized instead of pretty printed?

Bernd

Am Donnerstag, 2. August 2012 15:37:55 UTC+2 schrieb David Pilato:

Almost. I mean that fsriver is using a bulk to commit changes in ES.
You can start to modify bulk_size to 10 to see if it helps.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

On Thu, 2012-08-02 at 06:10 -0700, Bernd Fehling wrote:

Well, after having ES installed and set up 2 nodes on one machine for
testing I tried to get some content into ES.
The results are very bad. After 2 days not a single useful doc with an
acceptable mapping.
Best I could get was:
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException: transport content length received [1.2gb] exceeded [914.1mb]
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:143)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:103)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:563)
...

I don't know how ES comes to 1.2gb or even 914.1mb, because I only
tried to load about 68mb (112 files) with fsriver.

You're connecting to port 9300 with http instead of to port 9200.

You'll get the same error message with:

curl http://localhost:9300

clint

Ahhh, thats it. It says:
[transport ] [node_1] bound_address
{inet[/192.168.70.23:9300]
[http ] [node_1] bound_address
{inet[/192.168.70.23:9200]

So I will try with port 9200, thanks.

Bernd

You're connecting to port 9300 with http instead of to port 9200.

You'll get the same error message with:

curl http://localhost:9300 

clint

OK, one step further.

repsone is
{"ok":true,"_index":"_river","_type":"base","_id":"_meta","_version":1}
what ever that means???

But also:
[2012-08-02 16:09:26,694][INFO ][cluster.metadata ] [node_1]
[_river] creating index, cause [auto(index api)], shards [5]/[1], mappings

[2012-08-02 16:09:27,028][INFO ][cluster.metadata ] [node_1]
[_river] update_mapping [base] (dynamic)
[2012-08-02 16:09:27,068][WARN ][river ] [node_1] failed
to create river [fs][base]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load
class with value [fs]
at
org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:86)
at
org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:57)
at
org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at
org.elasticsearch.river.RiversService.createRiver(RiversService.java:135)
at
org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:270)
at
org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:264)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:86)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: fs
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

What class "fs" is he missing?

curl -XPUT '192.168.12.37:9200/_river/base/_meta' -d '{

"type": "fs",
"fs": {
"name": "OAI JSON",
"url": "/srv/www/solr/DATA/OAI_JSON",
"update_rate": 36000000,
"includes": "*.json"
},
"index": {
"index": "base",
"type": "base",
"bulk_size": 10
}
}'

It is nearly a cut&paste from the fsriver GitHub example.

Very strange...

Bernd

Am Donnerstag, 2. August 2012 16:07:26 UTC+2 schrieb Bernd Fehling:

Ahhh, thats it. It says:
[transport ] [node_1] bound_address {inet[/
192.168.70.23:9300]
[http ] [node_1] bound_address {inet[/
192.168.70.23:9200]

So I will try with port 9200, thanks.

Bernd

You're connecting to port 9300 with http instead of to port 9200.

You'll get the same error message with:

curl http://localhost:9300 

clint

Thanks Clint.
I don't feel guilty anymore! :wink:

fs is missing means that the fs river plugin is not installed.

When you start ES you should see it in the plugins list in logs.
If not, bin/plugin -install as mentionned in the fsriver doc.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 2 août 2012 à 16:18, Bernd Fehling bernd.fehling@googlemail.com a écrit :

OK, one step further.

repsone is {"ok":true,"_index":"_river","_type":"base","_id":"_meta","_version":1}
what ever that means???

But also:
[2012-08-02 16:09:26,694][INFO ][cluster.metadata ] [node_1] [_river] creating index, cause [auto(index api)], shards [5]/[1], mappings
[2012-08-02 16:09:27,028][INFO ][cluster.metadata ] [node_1] [_river] update_mapping [base] (dynamic)
[2012-08-02 16:09:27,068][WARN ][river ] [node_1] failed to create river [fs][base]
org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class with value [fs]
at org.elasticsearch.river.RiverModule.loadTypeModule(RiverModule.java:86)
at org.elasticsearch.river.RiverModule.spawnModules(RiverModule.java:57)
at org.elasticsearch.common.inject.ModulesBuilder.add(ModulesBuilder.java:44)
at org.elasticsearch.river.RiversService.createRiver(RiversService.java:135)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:270)
at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:264)
at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:86)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: fs
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

What class "fs" is he missing?

curl -XPUT '192.168.12.37:9200/_river/base/_meta' -d '{

"type": "fs",
"fs": {
"name": "OAI JSON",
"url": "/srv/www/solr/DATA/OAI_JSON",
"update_rate": 36000000,
"includes": "*.json"
},
"index": {
"index": "base",
"type": "base",
"bulk_size": 10
}
}'

It is nearly a cut&paste from the fsriver GitHub example.

Very strange...

Bernd

Am Donnerstag, 2. August 2012 16:07:26 UTC+2 schrieb Bernd Fehling:
Ahhh, thats it. It says:
[transport ] [node_1] bound_address {inet[/192.168.70.23:9300]
[http ] [node_1] bound_address {inet[/192.168.70.23:9200]

So I will try with port 9200, thanks.

Bernd

You're connecting to port 9300 with http instead of to port 9200.

You'll get the same error message with:

curl http://localhost:9300 

clint

Out of curiosity, I had this error other day when I hit the wrong port
using curl, I've freaked out for a few seconds, and then realized I hit the
wrong port.

May I ask where does numbers come? 1.2gb exceed 914.1? Cuz I had the same
doing a simple query on an index in a wrong port :slight_smile:

Regards

On Thursday, August 2, 2012 9:10:35 AM UTC-4, Bernd Fehling wrote:

Well, after having ES installed and set up 2 nodes on one machine for
testing I tried to get some content into ES.
The results are very bad. After 2 days not a single useful doc with an
acceptable mapping.
Best I could get was:
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException:
transport content length received [1.2gb] exceeded [914.1mb]
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:143)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:103)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:563)
...

I don't know how ES comes to 1.2gb or even 914.1mb, because I only tried
to load about 68mb (112 files) with fsriver.
My current idea is that ES devides my 68mb by its version number 0.198 but
this would make 343mb. (You see I havent't lost my humor...)
Is there ANY useful example somewhere, which really explains step by step
how to start with index, mapping, loading, ...

Bernd

When you connect to a port using curl, curl sends HTTP requests that looks
like this:

POST /my-index/my-type/1 HTTP/1.1
User-Agent: curl/7.22.0 (x86_64-apple-darwin1......

When Netty connector of elasticsearch that listens on port 9300 receives a
request, it expects that the first 4 bytes of the request will contain the
length of data frame that is about to follow.

So, the first 4 bytes that curl sends are 'P', 'O', 'S', 'T' or 0x50, 0x4F,
0x53, 0x54. Netty treats these 4 bytes as a big endian
integer 1,347,375,956 and figures out that it should expect
approximately 1.2gb of data to follow. It then compares it to 90% of the
max heap size, which is 914.1m if elasticsearch was started with default
1GB of memory, figures that there is no way it can handle such a big
request and bails out by throwing TooLongFrameException.

Igor

On Thursday, August 2, 2012 11:36:54 AM UTC-4, Vinicius Carvalho wrote:

Out of curiosity, I had this error other day when I hit the wrong port
using curl, I've freaked out for a few seconds, and then realized I hit the
wrong port.

May I ask where does numbers come? 1.2gb exceed 914.1? Cuz I had the same
doing a simple query on an index in a wrong port :slight_smile:

Regards

On Thursday, August 2, 2012 9:10:35 AM UTC-4, Bernd Fehling wrote:

Well, after having ES installed and set up 2 nodes on one machine for
testing I tried to get some content into ES.
The results are very bad. After 2 days not a single useful doc with an
acceptable mapping.
Best I could get was:
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException:
transport content length received [1.2gb] exceeded [914.1mb]
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:143)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:103)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:563)
...

I don't know how ES comes to 1.2gb or even 914.1mb, because I only tried
to load about 68mb (112 files) with fsriver.
My current idea is that ES devides my 68mb by its version number 0.198
but this would make 343mb. (You see I havent't lost my humor...)
Is there ANY useful example somewhere, which really explains step by step
how to start with index, mapping, loading, ...

Bernd

Thanks, Igor, for tracking this down so thoroughly. I think it's time to
send Shay a patch for the ES MessageChannelHandler, so the exception will
no longer look so horrible if a HTTP connection is accidentally made to the
9300 binary protocol port.

Best regards,

Jörg

On Friday, August 3, 2012 12:12:23 AM UTC+2, Igor Motov wrote:

When you connect to a port using curl, curl sends HTTP requests that looks
like this:

POST /my-index/my-type/1 HTTP/1.1
User-Agent: curl/7.22.0 (x86_64-apple-darwin1......

When Netty connector of elasticsearch that listens on port 9300 receives a
request, it expects that the first 4 bytes of the request will contain the
length of data frame that is about to follow.

So, the first 4 bytes that curl sends are 'P', 'O', 'S', 'T' or 0x50,
0x4F, 0x53, 0x54. Netty treats these 4 bytes as a big endian
integer 1,347,375,956 and figures out that it should expect
approximately 1.2gb of data to follow. It then compares it to 90% of the
max heap size, which is 914.1m if elasticsearch was started with default
1GB of memory, figures that there is no way it can handle such a big
request and bails out by throwing TooLongFrameException.

Igor

On Thursday, August 2, 2012 11:36:54 AM UTC-4, Vinicius Carvalho wrote:

Out of curiosity, I had this error other day when I hit the wrong port
using curl, I've freaked out for a few seconds, and then realized I hit the
wrong port.

May I ask where does numbers come? 1.2gb exceed 914.1? Cuz I had the same
doing a simple query on an index in a wrong port :slight_smile:

Regards

On Thursday, August 2, 2012 9:10:35 AM UTC-4, Bernd Fehling wrote:

Well, after having ES installed and set up 2 nodes on one machine for
testing I tried to get some content into ES.
The results are very bad. After 2 days not a single useful doc with an
acceptable mapping.
Best I could get was:
org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException:
transport content length received [1.2gb] exceeded [914.1mb]
at
org.elasticsearch.transport.netty.MessageChannelHandler.callDecode(MessageChannelHandler.java:143)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:103)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:563)
...

I don't know how ES comes to 1.2gb or even 914.1mb, because I only tried
to load about 68mb (112 files) with fsriver.
My current idea is that ES devides my 68mb by its version number 0.198
but this would make 343mb. (You see I havent't lost my humor...)
Is there ANY useful example somewhere, which really explains step by
step how to start with index, mapping, loading, ...

Bernd

OK, having clearified the exception message while using the wrong port I
still can't get any results.
The fsriver README says:

bin\plugin -install dadoonet/fsriver

done that.

[INFO ][plugins ] [node_1] loaded [], sites [bigdesk,
fsriver, mapper-attachments, head, paramedic]

So why is it complaining about "fs" when using the example?
I'm using nearly the same as from the example:

curl -XPUT 'localhost:9200/_river/mydocs/_meta' -d '{
"type": "fs",
"fs": {
"name": "My tmp dir",
"url": "/tmp",
"update_rate": 900000,
"includes": ".doc,.pdf",
"excludes": "resume"
}
}'

What does "_river" mean?
Where does "mydocs" stand for?
What is the meaning of "_meta"?
What is "type" : "fs" ?
what does "fs": {... mean?
Where should I enter my index name?
Where should I place my mapping?
Do I have to enter the plugin name "fsriver" somewhere (but it is not in the example)?
Why do I have a new index "_river" after using the example?

Very, very strange...

Bernd

Now this is also very strange, after installing fsriver with:

bin\plugin -install dadoonet/fsriver

and restarting all nodes I get the info:
[INFO ][plugins ] [node_1] loaded [], sites [bigdesk,
fsriver, mapper-attachments, head, paramedic]

So ES has found the new installed "fsriver" plugin.
But there is nowhere something like fsriver.jar or fs.jar or any other new
*.jar
Under plugins/fsriver/ directory are only *.java source files, xml-files,
pom.xml and other properties and text files located.

Bernd

Finally "fsriver" is running.
If installing the plugin you MUST use the version number. If not, you are
only getting the master sources.
I thought by omitting the version number it would install the most recent
version. WRONG!!!

Yes. That was in the README :

$ bin\plugin -install dadoonet/fsriver/0.0.2

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 6 août 2012 à 10:58, Bernd Fehling bernd.fehling@googlemail.com a écrit :

Finally "fsriver" is running.
If installing the plugin you MUST use the version number. If not, you are only getting the master sources.
I thought by omitting the version number it would install the most recent version. WRONG!!!

Just for the files, the patch is submitted here

On Friday, August 3, 2012 12:49:03 AM UTC+2, Jörg Prante wrote:

Thanks, Igor, for tracking this down so thoroughly. I think it's time to
send Shay a patch for the ES MessageChannelHandler, so the exception will
no longer look so horrible if a HTTP connection is accidentally made to the
9300 binary protocol port.