Hello
I'm facing an issue with duplicate records in my index on rollover.
I thought using rollover alias would let only one index to be "write" index which is supposed to not let any batch of records being indexed to be written in 2 indices.
Also I have specified the document id to be the unique ID from my DB table so that if the same record is sent to Elasticsearch, it should be overwritten and not indexed as a new record.
Logstash pipeline configuration :
input {
jdbc {
jdbc_driver_library => "**"
jdbc_driver_class => "**"
jdbc_connection_string => "**"
jdbc_user => "**"
jdbc_password => "**"
jdbc_validate_connection => true
jdbc_default_timezone => "Asia/Riyadh"
jdbc_validation_timeout => 60
schedule => "*/1 * * * *"
connection_retry_attempts => 1
last_run_metadata_path => "**/metadata/service_logger_lastrun"
tracking_column => "call_start_time"
tracking_column_type => "timestamp"
use_column_value => "true"
statement => "
SELECT
ID,
CALLID AS SERVICE_DETAILS,
SERVED_NUMBER,
DURATION,
DBTIMESEQ AS CALL_START_TIME,
SERVICE_NUMBER AS SERVICE_NAME,
CASE
WHEN STATUS = 'SUCCESS' AND JSON_VALUE(response, '$.Response.status.error.errorDescription') = 'No data found' THEN 'No Data Found'
WHEN STATUS <> 'SUCCESS' THEN STATUS
WHEN RESPONSE IS NULL AND STATUS = 'SUCCESS' THEN STATUS
WHEN JSON_VALUE(response, '$.Response.status.result') IS NOT NULL THEN JSON_VALUE(response, '$.Response.status.result')
WHEN JSON_VALUE(response, '$.eaiResponse.status.result') IS NOT NULL THEN JSON_VALUE(response, '$.eaiResponse.status.result')
ELSE STATUS
END AS STATUS,
CASE
WHEN JSON_VALUE(response, '$.eaiResponse.status.error.errorDescription') IS NOT NULL THEN JSON_VALUE(response, '$.eaiResponse.status.error.errorDescription')
WHEN JSON_VALUE(response, '$.Response.status.error.errorDescription') IS NOT NULL THEN JSON_VALUE(response, '$.Response.status.error.errorDescription')
ELSE NULL
END AS ERROR_DESCRIPTION,
CASE
WHEN JSON_VALUE(response, '$.eaiResponse.status.error.errorCategory') IS NOT NULL THEN JSON_VALUE(response, '$.eaiResponse.status.error.errorCategory')
WHEN JSON_VALUE(response, '$.Response.status.error.errorCategory') IS NOT NULL THEN JSON_VALUE(response, '$.Response.status.error.errorCategory')
ELSE NULL
END AS ERROR_CATEGORY,
CASE
WHEN JSON_VALUE(response, '$.eaiResponse.status.error.errorCode') IS NOT NULL THEN JSON_VALUE(response, '$.eaiResponse.status.error.errorCode')
WHEN JSON_VALUE(response, '$.Response.status.error.errorCode') IS NOT NULL THEN JSON_VALUE(response, '$.Response.status.error.errorCode')
ELSE NULL
END AS ERROR_CODE,
CASE
WHEN JSON_VALUE(response, '$.eaiResponse.status.error.errorType') IS NOT NULL THEN JSON_VALUE(response, '$.eaiResponse.status.error.errorType')
WHEN JSON_VALUE(response, '$.Response.status.error.errorType') IS NOT NULL THEN JSON_VALUE(response, '$.Response.status.error.errorType')
ELSE NULL
END AS ERROR_TYPE,
ERRORS_DESCRIPTION,
REQUEST,
RESPONSE,
CUSTOM1,
CUSTOM2
FROM SERVICE_LOGGER
WHERE DBTIMESEQ > :sql_last_value
"
tags => ["service_logger"]
}
}
filter {
}
output{
elasticsearch {
hosts => ["**"]
timeout => 60
#index => "bk_service_logger-%{+YYYY.MM.dd}"
document_id => "%{id}"
user => "**"
password => "**"
ssl => true
ssl_certificate_verification => true
ssl_certificate_authorities => ['**']
template_name => "service_logger_template "
ilm_enabled => true
ilm_rollover_alias => "bk_service_logger"
index => "bk_service_logger"
ilm_pattern => "{now/d{yyyy.MM.dd}}-000001"
ilm_policy => "service_logger_90d_ILM_policy"
manage_template => false
}
}
I have read an article about deduplication which mentioned "fingerprint" filter but what I understood that the fingerprint is also unique across the index only. Meaning rolling over to a new index would still consider the same unique ID (from the previous index) to be new (in the new index) and hence, the duplication occurs.
Is there any way to solve this ?