Bulk Data upload in logstash


(Gautham) #1

Hi All,

We have been trying to index data using logstash from servicenow through restapi's, we dont see any errors and data is getting indexed only issue we are facing is we do have around 2lakh records to be indexed but only 10k records are getting indexed.
Is there any restriction applied in logstash to index only certain amount of data, or in other words how can i index a bulk data like 2lakh+ records.

Any advice please

Thanks
Gauti


#2

hi Gautham,

yes, you can index 2lakh+ records(bulk data) and there is no restriction.
if you could provide input code it will be easy to identify the problem and give solution.

Regards,
Balu


(Gautham) #3

hi @balumurari1 here is the config file

input {
  http_poller {
    urls => {
      url => "https://demo1.service-now.com/api/now/table/incident?sysparm_display_value=true&sysparm_exclude_reference_link=True&sysparm_fields=number%2Ccategory%2Cpriority%2Cstate%2Cassignment_group%2Cassigned_to%2Cchild_incidents%2Cclose_code%2Cclosed_at%2Cclosed_by%2Ccompany%2Ccmdb_ci%2Ccontact_type%2Csys_created_on%2Csys_created_by%2Cdescription%2Cescalation%2Cimpact%2Cknowledge%2Cproblem_id%2Creassignment_count%2Creopen_count%2Cresolved_at%2Cseverity%2Curgency%2Ccaller_id.location.latitude%2Ccaller_id.location.longitude"
    }
    request_timeout => 60
proxy => { host => "1.1.1.2" port => "9090" scheme => "http"}
    user => "G435421"
    password => "*******"
    schedule => { cron => "* * * * *"}
    codec => "json"
    metadata_target => "http_poller_metadata"
  }
}
filter
       {
         split
                 {
                 field => "result"
                 }
  }
}
output {
  elasticsearch {
    hosts => ["1.1.1.3:9200"]
    index => "servicenow"
  }
#stdout { codec => rubydebug }
} 

Thanks
Gauti


(Lewis Barclay) #4

What happens if you increase the request timeout?


(Gautham) #5

@Eniqmatic even after changing, only 10k documents are getting indexed out of 2lakh documents.

Thanks
Gauti


#6

probably, it is taking more time to get the data from the api, which is causing to reach your timeout.
How much timeout have you specified in your input code?


(Gautham) #7

@balumurari1 Now i have give 600, but still no change, in actual it is indexing the same 10k records again and again.

Is there anything i need to do with the shards allocation or something like that?

Thanks
Gauti