Pulling more than 2000 records from ServiceNow api using http_poller

I have created the pipeline using http_poller to pull the serviceNow CI records.
To pull more than 2000 records, i am hitting the ServiceNow api multiple time with setting up start and count values using http_poller in pipeline.

However currently i am receiving the duplicates records also, why i am receiving those duplicate records and how can i overcome from its.

below is my pipeline config file.

input {
http_poller {
urls => {
records2000 => {
# Supports all options supported by ruby's Manticore HTTP client
method => get
url => "https://service-now.com/api/bebup/config/ci/query/list?encoded_query=install_status!%3D104%26start%3D1%26count%3D2000%26use_display_value%3DTRUE"
headers => {
Accept => "application/json"
}
}
records4000 => {
# Supports all options supported by ruby's Manticore HTTP client
method => get
url => "https://service-now.com/api/bebup/config/ci/query/list?encoded_query=install_status!%3D104%26start%3D2001%26count%3D4000%26use_display_value%3DTRUE"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 90

    schedule => { cron => "0 0 * * *"}
    socket_timeout => 90
    codec => "json"
    # A hash of request metadata info (timing, response headers, etc.) will be sent here
    # metadata_target => "http_poller_metadata"
}

}

I would browse to both of those URL's or do a curl to get the results. Compare both of those and see if you are getting duplicates between those. If so the issue is with the source or query.

I can't think of any reasons this pipeline would create duplicate records so I would test that first.

I cannot find any documentation of this API on the Internet so I would firstly ask if it returns records in order? Do you need to supply a sort option?

Also, in the second one, you use count=4000. Should that be 2000?

You do not say what your output is but if it is elasticsearch then you may be able to handle duplicates by setting the document id to a hash of identifying fields from the events using a fingerprint filter.

Note that if the results are not sorted, then although using a fingerprint filter will avoid duplicates, you still may never get the complete result set. A randomly selected subset from a large group can result in some records never being selected.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.