Logstash to elasticsearch

Hi,
while inserting data from database to elasticsearch index. the query statement run to fetch the no of count.
Can we alter the query which runs to get the count.

@Senthiles You can specify the query that is used by the Logstash JDBC input plugin. See: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_configuring_sql_statement

The query is actually for all the data you want to fetch, not just a count. But, if that query doesn't return any data, then none would be available for inserting into ES.

That said, the JDBC input keeps track of which data it has already seen, stored on disk at the location configured in last_run_metadata_path. You can configure how to determines what it thinks it's "seen" before by configuring the tracking fields

hi,
the below is the id value i am getting while running the logstash.
"_index": "general-report-hub1",
"_type": "doc",
"_id": "%{task_id}",
"_score": 1.0,

my conf file is

jdbc_driver_class => "org.postgresql.Driver"
#scheduler
schedule => "* * * * *"
#pagination
jdbc_paging_enabled => true
#fetch size
jdbc_fetch_size => 50
#page size
jdbc_page_size => 100
#tracking column definition
tracking_column_type => "timestamp"
#tracking column
tracking_column => "modified_date"
statement => "SELECT * from table where modified_date = :sql_last_value

output {
elasticsearch {
index => "index1"
#document_type => "task"
document_id => "%{task_id}"
manage_template => false
hosts => "...*"
}
}

who to remove the document with id value "%{task_id}" in the index

@Senthiles is task_id a field in your incoming documents? In other words, is it already defined someplace?

Also, unless you absolutely want to use that task_id as the underlying Elasticsearch document id (_id), I'd not set document_id and just let Elasticsearch generate one of its own. Since documents are assigned to shards based on their "place" in the ID keyspace, manually setting the _id field can result in shard imbalance issues. A better way to store an external id would be in a dedicated field that you create, like original_id or in this case, task_id

That make sense?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.