Ruby client and bulk method issue


#1

Hi there,

I've been working on a piece of script for a few days and I really can't solve the problem by myself, apparently.

It's a ruby script that queries for metrics in my ES (version 2.2.0) index and asks for aggregations (average and max aggs). I gather these aggregations and then try to index them with the bulk method.

I build a string that contains the body with the syntax specified here (i tried the three syntaxes, all of them failed...)


Here is a sample of the string that is being sent to ES :
[
{index:{_index:"test",_type:"ganglia_metrics",data:{pkts_in:9397.0975,proc_total:1427.5,cpu_wio:0.7,cpu_user:2.7125000000000004,bytes_in:798807.4312499999,cpu_aidle:94.5,load_five:0.55125,mem_cached:35948420.0,cpu_speed:2800.0,cpu_idle:95.0875,cpu_num:16.0,cpu_system:1.5125,load_one:0.45875,mem_total:61834568.0,proc_run:1.5,load_fifteen:0.6575,bytes_out:41040501.0,disk_total:145.874,mem_free:19215205.0,mem_shared:0.0,cpu_nice:0.0,pkts_out:29566.95125,host:"service7.ice",debut:"2016-02-24T16:00:00+00:00",fin:"2016-02-24T17:00:00+00:00",historical:true}}},
{index:{_index:"test",_type:"ganglia_metrics",data:{pkts_in:11.12,proc_total:506.0,cpu_wio:0.0,cpu_user:0.0,bytes_in:1375.6,cpu_aidle:99.6,load_five:0.0,mem_cached:4448228.0,cpu_speed:2800.0,cpu_idle:99.9875,cpu_num:16.0,cpu_system:0.0,load_one:0.0,mem_total:33007108.0,proc_run:0.0,load_fifteen:0.0,bytes_out:79.60000000000001,disk_total:246.423,mem_free:25658957.0,mem_shared:0.0,cpu_nice:0.0,pkts_out:0.55,host:"service8.ice",debut:"2016-02-24T16:00:00+00:00",fin:"2016-02-24T17:00:00+00:00",historical:true}}},
{index:{_index:"test",_type:"ganglia_metrics",data:{pkts_in:12.59,proc_total:485.0,cpu_wio:0.0,cpu_user:0.0,bytes_in:1488.4225000000001,cpu_aidle:99.4,load_five:0.0,mem_cached:16335680.0,cpu_speed:2800.0,cpu_idle:100.0,cpu_num:16.0,cpu_system:0.0,load_one:0.0,mem_total:24733248.0,proc_run:0.0,load_fifteen:0.0,bytes_out:109.54875000000001,disk_total:121.875,mem_free:1484936.5,mem_shared:0.0,cpu_nice:0.0,pkts_out:0.9962500000000001,host:"service9.ice",debut:"2016-02-24T16:00:00+00:00",fin:"2016-02-24T17:00:00+00:00",historical:true}}},
{index:{_index:"_test",_type:"ganglia_metrics",data:{pkts_in:1.17,proc_total:376.0,cpu_wio:0.0,cpu_user:100.0,bytes_in:69.07,cpu_aidle:48.10000000000001,load_five:7.854285714285715,mem_cached:12550029.714285715,cpu_speed:2801.0,cpu_idle:0.0,cpu_num:8.0,cpu_system:0.0,load_one:7.984285714285714,mem_total:24602312.0,proc_run:8.714285714285714,load_fifteen:7.828571428571428,bytes_out:114.15,disk_total:0.0,mem_free:3454258.285714286,mem_shared:0.0,cpu_nice:0.0,pkts_out:0.8,host:"r4i2n14",debut:"2016-02-24T16:00:00+00:00",fin:"2016-02-24T17:00:00+00:00",historical:true}}}
]

I get this error message :

2016-03-25 11:05:33 +0100: < {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"parse_exception","reason":"Failed to derive xcontent"},"status":400}
2016-03-25 11:05:33 +0100: [400] {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"parse_exception","reason":"Failed to derive xcontent"},"status":400}
/var/lib/gems/2.1.0/gems/elasticsearch-transport-1.0.15/lib/elasticsearch/transport/transport/base.rb:146:in `__raise_transport_error': [400] {"error":{"root_cause":[{"type":"parse_exception","reason":"Failed to derive xcontent"}],"type":"parse_exception","reason":"Failed to derive xcontent"},"status":400} (Elasticsearch::Transport::Transport::Errors::BadRequest)

#2

I tried another thing : taking a sample of the data I want to index and writing it in my code, in the bulk method's body :

client.bulk body: [
{index:{_index:"test",_type:"ganglia_metrics",data:{pkts_in:1.17,proc_total:376.0,cpu_wio:0.0,cpu_user:100.0,bytes_in:69.07,cpu_aidle:48.10000000000001,load_five:7.854285714285715,mem_cached:12550029.714285715,cpu_speed:2801.0,cpu_idle:0.0,cpu_num:8.0,cpu_system:0.0,load_one:7.984285714285714,mem_total:24602312.0,proc_run:8.714285714285714,load_fifteen:7.828571428571428,bytes_out:114.15,disk_total:0.0,mem_free:3454258.285714286,mem_shared:0.0,cpu_nice:0.0,pkts_out:0.8,host:"r4i2n14",debut:"2016-02-24T16:00:00+00:00",fin:"2016-02-24T17:00:00+00:00",historical:true}}}
]

This time, it is accepted, but here is what it answers :

2016-03-25 11:12:02 +0100: POST http://machine:9200/_bulk [status:200, request:0.179s, query:0.175s]
2016-03-25 11:12:02 +0100: > {"index":{"_index":"test","_type":"ganglia_metrics"}}
{"pkts_in":1.17,"proc_total":376.0,"cpu_wio":0.0,"cpu_user":100.0,"bytes_in":69.07,"cpu_aidle":48.10000000000001,"load_five":7.854285714285715,"mem_cached":12550029.714285715,"cpu_speed":2801.0,"cpu_idle":0.0,"cpu_num":8.0,"cpu_system":0.0,"load_one":7.984285714285714,"mem_total":24602312.0,"proc_run":8.714285714285714,"load_fifteen":7.828571428571428,"bytes_out":114.15,"disk_total":0.0,"mem_free":3454258.285714286,"mem_shared":0.0,"cpu_nice":0.0,"pkts_out":0.8,"host":"r4i2n14","debut":"2016-02-24T16:00:00+00:00","fin":"2016-02-24T17:00:00+00:00","historical":true}

2016-03-25 11:12:02 +0100: < {"took":175,"errors":false,"items":[{"create":{"_index":"test","_type":"ganglia_metrics","_id":"AVOtQmJKW2eBK5B08Bpq","_version":1,"_shards":{"total":2,"successful":2,"failed":0},"status":201}}]}

It seems to index something (which doesn't contain the data passed in the body), but actually, I can't find this document if I query for it ....

So, I ask myself some stuff :

Is it a mistake to pass a string as the body of the bulk method ?
What is the best syntax if I want to bulk index 300 documents ?
Do I just ignore that there is a super duper stuff/plugin/whatever that indexes aggregations ? (that would be amazing)

Hoping someone succeeded in doing such a thing, I will keep on trying until I die on my keyboard :slight_smile:

Bye !
Antoine


#3

OK, so passing body as a String was not a good idea.
I went back to the solution I started with : Creating an array of hashes that i pass as an parameter to the bulk method.

None of the 3 syntaxes described in ruby doc works :

  • 1st just indexes empty documents
  • 2nd and 3rd tells me that data is not an acceptable parameter in action/metadata line ....

This is still better than the parsing failure I met with my poor excessively formatted string, but I still do not understand what is the problem.


(system) #4