Hi. We're using Logstash to transport SQL data into ElasticSearch via the JDBC plugin.
Two of the fields we're copying include:
Name
ZipCode
For some reason the first few characters of zipcode are being inserted into the Name column for a small percentage of records. For example:
"Mcdonald, Mike" who lives in 90210 is being inserted as "Mcdonald, 902 Mike" in the Name column in ES.
Out of 16m inserts into ES around 10% end up with this odd data in the Name column. Zip is fine. It's always 5 numbers.
Settings below:
input {
jdbc {
jdbc_connection_string => "blah"
jdbc_user => "blah"
jdbc_driver_library => "sqljdbc4.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
statement_filepath => "query.sql"
last_run_metadata_path => "last_run_metadata"
sql_log_level => "debug"
jdbc_paging_enabled => true
jdbc_page_size => 100000
jdbc_pool_timeout => 5000
}
}
output {
elasticsearch {
hosts => ["blah"]
index => "blah"
document_type => "document"
action => "update"
document_id => "%{id}"
doc_as_upsert => true
}
}
SQL Query:
SELECT Name, City, State, Zip, occupation, [GiveDate],
Amount, recip, Type, actID, actID as id, halftime, bicycle,
CASE when display is null then 'x' else display END as display, actID as DeleteID
from ourTable where actID is not null
and bicycle < '1985'
Ideas? It's driving us nuts.