Elasticsearch Data Loss in Index in ELK 6.4.2

Hi All,

I am using ELK stack 6.4.2 with Microsoft SQL Server. The Elasticsearch, Logstash are running in a Windows Virtual Machine. We have observed that a randomw document which was already index before goes missing from an Elasticsearch index. We are restoring the document again by changing a date column and pushing it in index through Logstash. Even after restoring the document in index, the document goes missing in the index again sometimes. This behavior is unpredictable.

We have observed this behavior recently. Please find below extra information regarding our environment:

  • We have around 5 GB data in the Elasticsearch node.
  • We are using a single node structure.
  • The Elasticsearch cluster, inedex health is yellow all the time.
  • We have replicas for every index as well.
  • We have multiple instances of Logstash running at a time on each index
  • Elasticsearch showed an error - low disk watermark [85%] , hence we kept more than 85% free space in the directory. But this issue is still observed.

Expecting a quick solution or reply. Thanks in advance.

Best Regards,
Shreyash

Elasticsearch will never deploy a replica shard to the same node as where the primary resides so having a replica configured when you only have one node does not make any sense and will always lead to yellow indices.

What is the output of the _cluster/stats API?

What does your Logstash config look like? How do you identify that the document has gone missing?

1 Like

Hi @Christian_Dahlqvist,

Thanks for reply.

I have observed that the document with a particular "_id" property goes missing from the index after some time. I have observed this with search queries to the elasticsearch.
This might happen during the update of the document by the Logstash. I have also observed that this issue happens with the document having large JSON structure. Logstash sometimes is not able to read the complete JSON string from the column data in the sql server which is of type nvarchar(max).

The Logstash config file looks like this:

input{
jdbc {
   jdbc_connection_string => "sqlserverconnection"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_user => "user"
	schedule => "*/2 * * * * *"
	statement =>"select ID,data from data(Nolock) where ModifiedDate > :sql_last_value order by ModifiedDate"
	last_run_metadata_path => "C:/logstash-6.4.2/logstash-6.4.2/data/lastRun/.logstash_jdbc_last_run"
	type=>"data"
  }
}

filter{
json {
    source => "data"
    target => "Data"    
    remove_field => ["data"]
 }
}


output{
if [type]=="data" {
elasticsearch {
    hosts => "localhost:9200"    
    index => "idx_data"
	document_id => "%{id}"
  }
  } 
  stdout { codec => rubydebug }
      
}

Please find the output of the "_cluster/stats" below.

{
    "_nodes": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "cluster_name": "elasticsearch",
    "timestamp": 1590339685409,
    "status": "yellow",
    "indices": {
        "count": 25,
        "shards": {
            "total": 125,
            "primaries": 125,
            "replication": 0.0,
            "index": {
                "shards": {
                    "min": 5,
                    "max": 5,
                    "avg": 5.0
                },
                "primaries": {
                    "min": 5,
                    "max": 5,
                    "avg": 5.0
                },
                "replication": {
                    "min": 0.0,
                    "max": 0.0,
                    "avg": 0.0
                }
            }
        },
        "docs": {
            "count": 2357855,
            "deleted": 3197
        },
        "store": {
            "size_in_bytes": 1919953025
        },
        "fielddata": {
            "memory_size_in_bytes": 0,
            "evictions": 0
        },
        "query_cache": {
            "memory_size_in_bytes": 713538,
            "total_count": 152469,
            "hit_count": 7201,
            "miss_count": 145268,
            "cache_size": 609,
            "cache_count": 2925,
            "evictions": 2316
        },
        "completion": {
            "size_in_bytes": 0
        },
        "segments": {
            "count": 607,
            "memory_in_bytes": 13432603,
            "terms_memory_in_bytes": 11065404,
            "stored_fields_memory_in_bytes": 601544,
            "term_vectors_memory_in_bytes": 0,
            "norms_memory_in_bytes": 781440,
            "points_memory_in_bytes": 125299,
            "doc_values_memory_in_bytes": 858916,
            "index_writer_memory_in_bytes": 0,
            "version_map_memory_in_bytes": 0,
            "fixed_bit_set_memory_in_bytes": 0,
            "max_unsafe_auto_id_timestamp": -1,
            "file_sizes": {}
        }
    },
    "nodes": {
        "count": {
            "total": 1,
            "data": 1,
            "coordinating_only": 0,
            "master": 1,
            "ingest": 1
        },
        "versions": [
            "6.4.2"
        ],
        "os": {
            "available_processors": 16,
            "allocated_processors": 16,
            "names": [
                {
                    "name": "Windows Server 2016",
                    "count": 1
                }
            ],
            "mem": {
                "total_in_bytes": 34359267328,
                "free_in_bytes": 12594618368,
                "used_in_bytes": 21764648960,
                "free_percent": 37,
                "used_percent": 63
            }
        },
        "process": {
            "cpu": {
                "percent": 3
            },
            "open_file_descriptors": {
                "min": -1,
                "max": -1,
                "avg": 0
            }
        },
        "jvm": {
            "max_uptime_in_millis": 303448982,
            "versions": [
                {
                    "version": "1.8.0_191",
                    "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
                    "vm_version": "25.191-b12",
                    "vm_vendor": "Oracle Corporation",
                    "count": 1
                }
            ],
            "mem": {
                "heap_used_in_bytes": 704804856,
                "heap_max_in_bytes": 1037959168
            },
            "threads": 168
        },
        "fs": {
            "total_in_bytes": 243369242624,
            "free_in_bytes": 47769489408,
            "available_in_bytes": 47769489408
        },
        "plugins": [],
        "network_types": {
            "transport_types": {
                "security4": 1
            },
            "http_types": {
                "security4": 1
            }
        }
    }
}

Best Regards,
Shreyash Karmali

What is the average and maximum size of documents?

1 Like

Hi @Christian_Dahlqvist,

The average size of the document is around 70kb. The max size of the document would be around 5MB.

Best Regards,
Shreyash Karmali

FYI it's pretty presumptive, and quite rude, to expect that.

Hi Mark,

I am really sorry. But i did not mean it that way.