Logstash jdbc document_id => "%{uid}" problem


(Marcin Kubica) #1

Hi
Logstash newbie here :smile:

Using logstash 2.0.0 and elastic 2.0.0

I'm trying to follow https://www.elastic.co/blog/logstash-jdbc-input-plugin and MySQL driver.

I'm setting in config

The data is being read ok-ish, (got multiple records returned) however on Elastic side I'm getting only one document indexed. In place of usual _id it's geting _id:%{uid}

If I don't use document_id => "%{uid}" I'm getting multiple documents indexed reflecting the SQL query output. However I wanted to avoid data duplication.

Any ideas?
Marcin


(Mark Walkom) #2

Can you provide your entire config?


(Marcin Kubica) #3

Hi Mark, thanks for asking.

input {
  jdbc {
    jdbc_driver_library => "/opt/logstash-2.0.0/vendor/mysql/mysql-connector-java-5.1.37-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://***:***/***"
    jdbc_user => "***"
    jdbc_password => "***"

    statement_filepath => "/opt/logstash-2.0.0/sql_queries/query3.sql"
  }
}

output {


#       stdout { codec=>rubydebug }

        elasticsearch {
                index => "****"
                document_type => "****"
                document_id => "%{uid}"
                hosts => "localhost:9200"
 }
}

(Christian Dahlqvist) #4

When you enable the stdout output, do you have a field named 'uid' that the document_id can be built from?


(Marcin Kubica) #5

Good question Christian!

In the result I don't have any ID field actually.

However in the example from the URL the result of SQL query also has no 'uid' thing. It's there seen in json output though.

And there you go - I've modified my sql query to return an id and used document_id => "%{id}" and this populated the row IDs into _id field.

I'm guessing it's possible to create a hash of returned values and have it as document_id?


(Christian Dahlqvist) #6

You can use the fingerprint or checksum plugins to create a hash based on specific field(s) and then use this to set document_id. If you have one or more fields in your data that make up a unique key, you may however be better off concatenating these into a key without hashing it.


(Marcin Kubica) #7

Thanks!


(Architha) #8

Hi ,
I have been facing the same issue. Could you tell me how you have solved it ?
Thanks !


(Marcin Kubica) #9

@Architha

From your SQL query you need to return one column with unique ID - a number (or maybe a hash).

Lets say, a name of that field will be my_ID

Then in your output section of logstash config you add (under elasticsearch { })

document_id => "%{my_ID}"

And that's all. Hope this helps!


(Architha) #11

@Marcin_Kubica

Is this possible with an already existing column with unique ID ?
Supposed i have an existing column Employee_id and config in the same way as mentioned (under elasticsearch{ } in Output filter)

document_id => "%{Employee_id}"

I still see "_id" : "%{Employee_id}" in the search results. Kindly help.

I am not very clear with "From your SQL query you need to return one column with unique ID - a number (or maybe a hash)."
Please elaborate :slight_smile:
Thanks a lot !


(Marcin Kubica) #12

Hi

is Employee_id (number) returned ( displayed ) as result of your SQL
select statement? I take it is not :wink: Had exactly this problem. Make sure
to include it in SELECT statement.

Marcin


(Architha) #13

Hi Marcin ,

The problem was solved when i used "employee_id" instead of "Employee_id" in the output section of Logstash conf file. It is case sensitive.

Thanks for your reply :slight_smile: Cheers !!


(system) #14