I’m experiencing an issue with incomplete data being indexed from Logstash into Elasticsearch. Despite my SQL query returning 453,237 rows from the database, only 99,000 documents are indexed in Elasticsearch. I’m trying to understand why this discrepancy occurs and how to resolve it.
Input Configuration:
input {
jdbc {
jdbc_connection_string => "jdbc:mysql://database-ip:3306/SUPER_TEST_TRACK"
jdbc_user => "admin"
jdbc_password => "P@ssword2020"
jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
jdbc_driver_library => "/usr/share/java/mysql-connector-java-9.1.0.jar"
jdbc_paging_enabled => true
jdbc_page_size => 1000
statement => "SELECT
s.site AS siteName,
p.palet_id AS palet_id,
p.date_production AS dateProduction,
COALESCE(p.delivery_date, '1970-01-01') AS deliveryDate,
p.palet_state AS paletState,
p.quantity AS quantity,
p.initial_quantity AS initialQuantity,
c.criterion_Name AS criterionName,
cv.criterion_value AS criterionValue,
(p.quantity * cv.criterion_value) AS calculatedQuantity
FROM
palet p
JOIN
palettes_criteria pc ON p.palet_id=pc.palette_id
JOIN
site s ON p.id_site = s.id_site
JOIN
criterion_value cv ON pc.criterion_value_id = cv.criterion_value_id
JOIN
criterion c ON cv.criterion_id = c.id_criterion
WHERE
c.criterion_name = 'unp'"
schedule => "* * * * *"
}
}
Filter Configuration:
filter {
mutate {
remove_field => ["@timestamp", "@version"]
}
}
Output Configuration:
output {
elasticsearch {
hosts => ["http://ip:9200"]
index => "mysql-denormalized_palet-data"
document_id => "%{siteName}-%{dateProduction}-%{deliveryDate}-%{palet_id}"
}
}
Observations:
- The SQL Query directly executed in MySQL returns all 453,237 rows.
- Elasticsearch indexes only 99,000 documents after the pipeline runs.
- When I query Elasticsearch:
GET /mysql-denormalized_palet-data/_count
It returns:
{
"count": 99000
}
- One indexed document looks like this:
{
"_index": "mysql-denormalized_palet-data",
"_type": "_doc",
"_id": "%{siteName}-%{dateProduction}-%{deliveryDate}-37969",
"_source": {
"sitename": "BERRECHID",
"palet_id": 37969,
"dateproduction": "2023-01-04T00:00:00.000Z",
"deliverydate": null,
"paletstate": "INSTORAGE",
"quantity": 36,
"initialquantity": 36,
"criterionvalue": "1.44",
"criterionname": "UNP",
"calculatedquantity": 51.84
}
}