Kibana hits are different

Hi Team,

Thanks a lot for all the advice and help its been a wonderful forum to ask and learn stuffs.
The issue is that the hit counts in discovery tab are varying when compared to the actual data because of which i am not able to get the right visualizations

http://sandbox.com:9200/fsimage-2017.10.03/_count
output: {"count":26304276,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0}}

http://sandox.com:9200/fsimage-2017.10.04/_count
{"count":36942343,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0}}

http://sandbox.com:9200/_cat/indices?v

yellow open fsimage-2017.10.04 BzplL5UMQFadIVScJapsUQ 5 1 36942343 0 32.4GB 32.4gb
yellow open fsimage-2017.10.03 b-4oEnRYQMK44trBWIGhBQ 5 1 26304276 0 22.2gb 22.2gb

fsimage-2017.10.04 is the correct one..the rest ones are way off limit and in one entire week except for the Oct 4th rest other days my kibana hits are less..

Please advise how to rectify this

Hi @sudi_2611,

thank you for the nice words. :slight_smile:

Could you maybe go into more detail about what the queries in the discover tab or in the visualizations are, maybe with some comparisons of expected vs actual results?

welcome!!!

In the discover tab without querying anything i should see approx 36,942,343 or more hits whereas i see approx 26,304,276 hits...
Expected hits are 36 million or more

during indexing I see the following error in logstash:
[2017-10-05T17:22:10,166][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}
[2017-10-05T17:22:10,166][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the con:

There are no errors in elastic search and kibana logs..

there were 2 index pattern and one had issues which has been removed from logstash file could it be because of that?

But elasticsearch was working fine....
I am not sure why kibana hits are way off the correct value

The first thing that comes to mind is that there are a few significant differences between the queries Discover uses and the _count queries you showed. The _count queries just count all documents within the index regardless of their timestamp while the Discover query filters by the date range selected in the time picker. That means that if the index fsimage-2017.10.03 includes documents that have no timestamp or whose timestamp is not on that day, they will be counted in the _count query, but not in the Discover query that uses the timefilter for the day 2017-10-03.

To check what the range of timestamp values in an index is, you could use something like the following (with the timestamp field name replaced):

GET fsimage-2017.10.03/_search
{ "aggs": { "timestamp_stats": { "stats": { "field": "@timestamp" } } }, "size": 0 }

To get the number of documents that have the timestamp field:

GET fsimage-2017.10.03/_count
{ "query": { "exists": { "field": "@timestamp" } } }

The request that Kibana sends to Elasticsearch can be inspected using the small arrow icon beneath the histogram at the top.

1 Like

Hi,
Thanks a lot for the suggestions.

when i execute the above mentioned query i get the following output:
The below is for GET fsimage-2017.10.03/_search
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 26304276,
"max_score": 0,
"hits": []
},
"aggregations": {
"timestamp_stats": {
"count": 26304276,
"min": 1507035305954,
"max": 1507039205420,
"avg": 1507037136233.399,
"sum": 39641520773732925000,
"min_as_string": "2017-10-03T12:55:05.954Z",
"max_as_string": "2017-10-03T14:00:05.420Z",
"avg_as_string": "2017-10-03T13:25:36.233Z",
"sum_as_string": "292278994-08-17T07:12:55.807Z"
}
}
}

this is for fsimage-2017.10.04/_search
{
"took": 329,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 36942343,
"max_score": 0,
"hits": []
},
"aggregations": {
"timestamp_stats": {
"count": 36942343,
"min": 1507121781038,
"max": 1507126610368,
"avg": 1507123908634.2017,
"sum": 55676688376265335000,
"min_as_string": "2017-10-04T12:56:21.038Z",
"max_as_string": "2017-10-04T14:16:50.368Z",
"avg_as_string": "2017-10-04T13:31:48.634Z",
"sum_as_string": "292278994-08-17T07:12:55.807Z"
}
}
}

The actual count of documents should be more than 36 million but expect for one instance rest all days i get the count somewhere close to 27 million...i am not sure why the rest of data is not showing.

hi,

I also find these errors when trying to output Logstash to Elasticsearch
[2017-10-10T16:52:31,992][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@2a875324 on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@113cc928[Running, pool size = 32, active threads = 32, queued tasks = 200, completed tasks = 16304171]]"})

[2017-10-10T16:56:56,983][WARN ][logstash.outputs.elasticsearch] UNEXPECTED POOL ERROR {:e=>#<LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError: No Available connections>}

[2017-10-10T16:56:56,983][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}

could it be because of this that all data is not being output to elasticsearch hence the data variance.
If so how do i rectify that?

Which version of Elasticsearch and Logstash are you using?

Hi,
logstash 5.6.1 and ES is also 5.6.1

Hi All,
Please provide advice for below issue as its impacting production

In logstash logs i see the following errors:

[2017-10-11T16:51:50,619][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 500 ({"type"=>"class_cast_exception", "reason"=>"org.elasticsearch.index.mapper.TextFieldMapper cannot be cast to org.elasticsearch.index.mapper.DateFieldMapper"})

[2017-10-11T16:51:50,487][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@5dd8577d on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@113cc928[Running, pool size = 32, active threads = 32, queued tasks = 200, completed tasks = 19473750]]"})

[2017-10-11T16:54:48,871][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}

because of the above errors elasticsearch is not able to receive accurate data.
ELK is running on a single node cluster.

@sudi_2611,

let me preface this by recommending to contact our support through https://support.elastic.co if you have a support contract and let them know you have a problem in your production environment. That way you will receive priority support that will probably be more timely than this best-effort forum. :wink:

With that out of the way, the error messages show two kinds of problems:

  • The first message indicates that you are trying to index documents with fields that can not be parsed according to the Elasticsearch mapping, e.g. a string can not be parsed as a date. The Elasticsearch logs should contain corresponding error messages for the rejected actions.
  • The second indicates that the connectivity of your Logstash instances to your Elasticsearch cluster is interrupted from time to time. That can have various reasons specific to your deployment environment, e.g. a lossy network connection, DNS problems, etc. The Elasticsearch logs might show corresponding entries as well. Otherwise I would suggest searching the system logs and system monitoring tools you use for any indication of networking problems.

Hi,

thanks a lot for the inputs...
my input is as follows:

filter {
if [type] == "fsimage" {
csv {
separator => "|"
columns => [ "HDFSPath", "replication", "ModificationTime", "AccessTime", "PreferredBlockSize", "BlocksCount", "FileSize", "NSQUOTA", "DSQUOTA", "permission", "user", "group" ]
convert => {
'replication' => 'integer'
'PreferredBlockSize' => 'integer'
'BlocksCount' => 'integer'
'FileSize' => 'integer'
'NSQUOTA' => 'integer'
'DSQUOTA' => 'integer'
}
}

date {
match => ['ModificationTime', 'YYYY-MM-ddHH:mm']
target => "modifyTime"
remove_field => ['ModificationTime']
}

    date {
            match => ['AccessTime', 'YYYY-MM-ddHH:mm']
            }

    date {
            match => ['AccessTime', 'YYYY-MM-ddHH:mm']
            target => "accessTime"
            remove_field => ['AccessTime']
            }

There are no errors for rejections in elasticsearch as well..

Hi @sudi_2611,

did you make any progress with your problem? The "rejected execution" error message also indicates your cluster is overloaded. Maybe expanding it to a 3-node cluster could help.

Hi,

Thanks a lot for getting back..still my elasticsearch is failing as its not able to cope up with incoming records..logstash is outputting close to 47 million records but its failing with above mentioned error in previous posts...

I have ELK setup in single node cluster which is of 256 Gb RAM and 48 cpu cores, i am not sure where the issue is...will increasing the queue capacity, heap size help?? my heap size for elasticsearch is 25 GB ..any inputs would be helpful...

I have tried all possible solutions, split the output file but still same errors.

The heap size sounds ok. (Due to limitations of the JVM, allocating 31GB of RAM (-Xmx32600m) to a node's heap is the recommended maximum.)

You might be able to improve your situation by running a three node Elasticsearch cluster on three machines with 64 GB each. That way half of each machine's memory can be used for the JVM heap and the other half for the OS filesystem buffer. Distributing the load on three machines should increase your indexing throughput due to improved parallelism. (In addition to all the other advantages of a multi-node cluster such as rolling updates and high availability.)

Increasing the queue size could help you if you want to compensate for temporary spikes, but a constant overload will still fill it up. Scaling the cluster by adding more nodes would probably be your best bet to increase the indexing rate.

thanks for the suggestions will look into...i am extracting the fsimage and delimiting the file into a csv will apache hive with ES hadoop connector work?? i have see the documentation my only question is can apache hive solve the problem??

I don't know much about Apache Hive, but I don't immediately see how using the ES Hadoop connector would help you. It could help if you had problems with retaining a long history of data, but if the ingest rate remains high, the single-node cluster will remain overloaded.

Oh k.. What if i split the 47million record file and store the split file in a staging directory and only output 100,000 records in intervals of time..will that work?? how much bulk request can elasticsearch take at a single point in time?

That depends on the performance of your cluster, which depends on the performance of the system it is running on. Batching up the records could reduce the overhead a bit, but logstash already does some batching itself, I think. In the end, indexing documents just takes time. Scaling the cluster horizontally is a good way to keep up with the rate at which documents are produced.

If your data are read from disk and not produced at a constant high rate, you might have some success using logstash's sleep filter to limit the rate at which logstash sends documents to Elasticsearch. That obviously introduces another significant delay into the processing until the data are available for search. If the rate of events is constant over the whole day scaling the cluster really seems to be the only option.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.