It seems that logstash is treating the [] in the query string with some special meaning, probably as field names, while I would expect every in the string to be passed to elastics.
With --debug option the stacktrace is attached below:
This looks like a Logstash bug, probably in the elasticsearch input. It should be initializing an event with a nil hash. I don't understand how that can happen though, unless the ES response is malformed. Can you use Wireshark or similar to sniff the request and response packets?
GET /logstash-2016.06.%2A/_search?scroll=1m&search_type=scan&size=1000 HTTP/1.1
User-Agent: Faraday v0.9.2
Accept: */*
Connection: close
Host: vhost02.homemaster.cn:9200
Content-Length: 55
Content-Type: application/x-www-form-urlencoded
{"fields": ["clientIP"], "query": { "match_all": {} } }HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 371
{"_scroll_id":"c2Nhbjs1OzEzOTEwNDo2Y2dTa3p6ZlFqLTB6dTB6eXozenh3OzEzOTEwNTo2Y2dTa3p6ZlFqLTB6dTB6eXozenh3OzE0ODc0MDpvSENqVUROOFNMLVo2ZlU1Ukc4cmZBOzE0ODc0MTpvSENqVUROOFNMLVo2ZlU1Ukc4cmZBOzEzOTEwNjo2Y2dTa3p6ZlFqLTB6dTB6eXozenh3OzE7dG90YWxfaGl0czoxMTc7","took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":117,"max_score":0.0,"hits":[]}}
GET /_search/scroll?scroll=1m HTTP/1.1
User-Agent: Faraday v0.9.2
Accept: */*
Connection: close
Host: vhost02.homemaster.cn:9200
Content-Length: 232
Content-Type: application/x-www-form-urlencoded
c2Nhbjs1OzEzOTEwNDo2Y2dTa3p6ZlFqLTB6dTB6eXozenh3OzEzOTEwNTo2Y2dTa3p6ZlFqLTB6dTB6eXozenh3OzE0ODc0MDpvSENqVUROOFNMLVo2ZlU1Ukc4cmZBOzE0ODc0MTpvSENqVUROOFNMLVo2ZlU1Ukc4cmZBOzEzOTEwNjo2Y2dTa3p6ZlFqLTB6dTB6eXozenh3OzE7dG90YWxfaGl0czoxMTc7HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 15281
{"_scroll_id":"c2NhbjswOzE7dG90YWxfaGl0czoxMTc7","took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":117,"max_score":0.0,"hits":[{"_index":"logstash-2016.06.29","_type":"log","_id":"AVWagR845YI_xmZDXo6D","_score":0.0,"fields":{"clientIP":["123.150.183.44"]}},{"_index":"logstash-2016.06.29","_type":"log","_id":"AVWanWiF5YI_xmZDXpaY","_score":0.0,"fields":{"clientIP":["52.3.127.144"]}}
...
The query is correctly sent to elastic and the response is also sent back. It's the way logstash process the response from elastics that causes the problem.
Can logstash not deal with "fields":{"clientIP":["52.3.127.144"]}?
Now I understand that logstash depends on the'_source' field for proper operation. So my job succeeds now after changing the "fields" filter to "_source":
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.