ElasticSearch 1.7.3 vs 2.0 vs 2.1 missing data

Adam_Wrobel · December 7, 2015, 8:56am

Hi.
I have ES cluster running on 1.7.3 storing logs parsed by logstash.
I want to upgrade to ES 2.x, so I ran migration plugin to check what I needed to change.
I prepare new logstash template compatible with ES 2.x and run new separated cluster with version 2.0.1 and another one separated with 2.1. I'm using logstash 2.1.0

Logs are send to 3 clusters with this part of code:

output {
elasticsearch {
hosts => "cluster17.es.service.consul"
template => "/etc/logstash/template_api.json"
index => "logstash-%{[@context][_index]}-%{+YYYY.MM.dd}"
template_overwrite => true
flush_size => 2000
retry_max_interval => 15
max_retries => 6
}
elasticsearch {
hosts => "cluster20.es.service.consul"
template => "/etc/logstash/template_api2.json"
index => "logstash-%{[@context][_index]}-%{+YYYY.MM.dd}"
template_overwrite => true
flush_size => 2000
retry_max_interval => 15
max_retries => 6
}
elasticsearch {
hosts => "cluster21.es.service.consul"
template => "/etc/logstash/template_api2.json"
index => "logstash-%{[@context][_index]}-%{+YYYY.MM.dd}"
template_overwrite => true
flush_size => 2000
retry_max_interval => 15
max_retries => 6
}
}

And I had weird problem.
In 1.7 cluster index from 1 day had:
logstash-api-2015.12.06 items: 11,473,555 size: 5.3GB
In 2.0.1 cluster:
logstash-api-2015.12.06 items: 9,609,880 size: 4.7GB
In 2.1 cluster:
logstash-api-2015.12.06 items: 9,608,696 size: 4.6GB

Difference between 1.7 and 2.x is huge. And for each full daily indexes 2.x had 15-18% less data.

I tested ES 2.X on different hardware hosts/vms to exclude hardware problems. Also there was no errors in logs.
I wrote script to compare indexes from 1.7 and 2.x and check what type of message is missing. But for each missing message I can POST it directly using curl to each cluster and everything saved without problems.
How to debug this issue ?

dadoonet · December 7, 2015, 9:37am

It sounds like that a shard is not back. Assuming you have 5 shards per index, that could represent almost 20%.

Did you look at the pending tasks?

Adam_Wrobel · December 7, 2015, 10:04am

@dadoonet all shards are alocated:

es 2.0.1:
{"cluster_name":"logstash20","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":21,"active_shards":21,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

es 2.1:
{"cluster_name":"logstash21","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":20,"active_shards":20,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

and pending_tasks on both clusters shows:
{"tasks":[]}

dadoonet · December 7, 2015, 10:19am

Ha! I totally missed the logstash part.

So you are sending your data to 3 clusters at the same time. But you have a different amount of logs.
1.7 and 2.0 have the same number of docs. But 2.1 has less.

When you said "Also there was no errors in logs.", did you mean logstash logs or elasticsearch logs?

Adam_Wrobel · December 7, 2015, 10:37am

@dadoonet: I'm sending data from logstash instance to 3 different ES cluster. And there is no errors in elasticsearch and logstash logs. Everything looks normal.
And when I post manually messages missed from 2.x clusters and existing in 1.7 I get confirmation with new inserted document _id.

dadoonet · December 7, 2015, 10:59am

I have no explanation.
May be you could open this thread in logstash forum so experts there might explain or trace things?

Adam_Wrobel · December 7, 2015, 11:16am

Sure, I write this also on logstash forum. Thanks for trying

Adam_Wrobel · December 8, 2015, 9:04am

@warkolm So rest of logstash config I used:

input {
redis {
host => "10.8.34.27"
port => 6379
data_type => "list"
key => "logstash"
}
redis {
host => "10.8.34.34"
port => 6379
data_type => "list"
key => "logstash"
}
redis {
host => "10.8.34.36"
port => 6379
data_type => "list"
key => "logstash"
}
redis {
host => "10.8.38.42"
port => 6379
data_type => "list"
key => "logstash"
}
}
filter {

drop messages bigger than 16kb

range {
ranges => [ "message", 16384, 99999999, "drop" ]
}
if [type] == "syslog" {
grok {
patterns_dir => "/opt/logstash/patterns"
match => [
# apache/nginx access logs
"message", "%{APACHE_ACCESS_COMBINED_VHOST}",
"message", "%{NGINX_ACCESS_LOGS}",
# json-formatted messages
"message", "%{JSON_MESSAGES}",
# logs from border load balancers
"message", "%{BORDER_LB_LOG}",
# bind logs
"message", "%{NAMED_LOG}",
# logs from fastly
"message", "%{EDGE_CACHE_LOG}",
"message", "%{FASTLY_DEBUG_LOG}",
"message", "%{FASTLY_RESTARTS_LOG}",
"message", "%{FASTLY_HOTLINKS}",
"message", "%{S_MAX_DEBUG}",
"message", "%{CB_DEBUG_LOG}",
# pt-kill
"message", "%{PT_KILL}",
# powerconnect
"message", "%{POWERCONNECT_LOG}",
"message", "%{VARNISHNCSA}",
# normal syslog messages
"message", "%{SYSLOG_STANDARD}"
]
}
if "_grokparsefailure" in [tags] {
grok {
match => {
"message" => "%{GREEDYDATA:syslog_message}"
}
}
} else {
if [jsonMessage] != [null] {
json {
source => "jsonMessage"
}
mutate {
remove_field => ["jsonMessage"]
add_tag => ["json"]
}
} else if [httpversion] != [null] {
mutate {
add_tag => ["apache_access_log"]
}
if [request] =~ /^/api/v1/ {
mutate {
add_field => ["[@context][_index]","api"]
}
}
} else {
mutate {
rename => ["syslog_message", "@message"]
add_tag => ["message"]
}
}
syslog_pri {}
mutate {
rename => [ "syslog_hostname", "@source_host" ]
rename => [ "syslog_severity", "severity" ]
rename => [ "syslog_facility", "facility" ]
rename => [ "syslog_pri", "priority" ]
rename => [ "syslog_program", "program" ]
remove_field => [
"@version",
"host",
"message",
"syslog_facility_code",
"syslog_severity_code",
"syslog_timestamp",
"type"
]
}
}
}
#access log from fastly syslog
if [tags] == "edge-cache-requestmessage" {
mutate {
add_field => ["[@context][_index]","api"]
}
}

database queries killed

if [program] == "pt-kill" {
date {
match => [ "timestamp", "ISO8601" ]
}
mutate {
remove_field => [ "timestamp" ]
}
}
mutate{
lowercase => [ "@context", "_index" ]
}
}

Adam_Wrobel · December 16, 2015, 7:39am

It was actually logstash fault. After upgrading to 2.1.1 each cluster had this same amount of data in each index.

dadoonet · December 16, 2015, 8:31am

Thanks for the update.

Topic		Replies	Views
Logstash 2.1.0 with ElasticSearch 1.7.3 vs 2.0 vs 2.1 missing data Logstash	3	1233	December 8, 2015
Logstash 2.0.0 vs elasticsearch Logstash	6	1267	July 6, 2017
Upgrade Logstash from 2.3 to 5.2, while still using Elasticsearch 2.x - template issues? Logstash	2	455	March 18, 2017
Migrating data off of ES 0.90.3 and into ES 1.7.x Elasticsearch	10	2127	July 5, 2017
Reindex data from 1.5 cluster to 2.x cluster Logstash	3	683	July 6, 2017

ElasticSearch 1.7.3 vs 2.0 vs 2.1 missing data

drop messages bigger than 16kb

database queries killed

Related topics