The data is being read ok-ish, (got multiple records returned) however on Elastic side I'm getting only one document indexed. In place of usual _id it's geting _id:%{uid}
If I don't use document_id => "%{uid}" I'm getting multiple documents indexed reflecting the SQL query output. However I wanted to avoid data duplication.
You can use the fingerprint or checksum plugins to create a hash based on specific field(s) and then use this to set document_id. If you have one or more fields in your data that make up a unique key, you may however be better off concatenating these into a key without hashing it.
Is this possible with an already existing column with unique ID ?
Supposed i have an existing column Employee_id and config in the same way as mentioned (under elasticsearch{ } in Output filter)
document_id => "%{Employee_id}"
I still see "_id" : "%{Employee_id}" in the search results. Kindly help.
I am not very clear with "From your SQL query you need to return one column with unique ID - a number (or maybe a hash)."
Please elaborate
Thanks a lot !
is Employee_id (number) returned ( displayed ) as result of your SQL
select statement? I take it is not Had exactly this problem. Make sure
to include it in SELECT statement.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.