Issue with modelling data while bulk import

Hello,

I have to bulk import data via LogStash as below:
Source: Restful Get APIs
Destination : ElasticSearch

My .conf file looks as below:
input {
http_poller {
urls => {
test1 => {
method => get
url => "https://someurl/movies"
headers => {
Accept => "application/json"
}
}
}

output {
elasticsearch {
action =>"index"
index => "movies"
document_type => "movie"
manage_template => false
document_id => "%{movieId}"
doc_as_upsert => true
hosts => ["http://localhost:9200/"]
}
}

This pushes all the data returned from api to Elastic Search but not in desired format.

Problem statement:

Source API returns an object with Arraylist (that has 100 movie objects).
{"movieList":[{"movieId":1,"movieName":"Bank1","uniqueName":"Bank1","showInSearch":true,"address":{"name":"Bank1"},"notes":"Preversion for webservice more data comes later"},{"movieId":2,"movieName":"Alpha Omega","uniqueName":"Alpha Omega","showInSearch":true,"address":{"name":"Alpha Omega"},"description":"Alpha Omega's offering ."}]}

This inserts data into Elastic search with only single record "movielist" which internally has multiple records.

For proper searching , i want to insert individual records instead of 1 complete arraylist .

Please guide me , how can i accomplish this .
Thanks!

If you get one big event with something like this ...

"movieList" => [
        [0] {
                 "address" => {
                "name" => "Bank1"
            },
                   "notes" => "Preversion for webservice more data comes later",
              "uniqueName" => "Bank1",
                 "movieId" => 1,
            "showInSearch" => true,
               "movieName" => "Bank1"
        },
        [1] {
                 "address" => {
                "name" => "Alpha Omega"
            },
              "uniqueName" => "Alpha Omega",
             "description" => "Alpha Omegas offering .",
                 "movieId" => 2,
            "showInSearch" => true,
               "movieName" => "Alpha Omega"
        }
    ]

you can do this:

split {
  field => "movieList"
 }
ruby {
    code => "
      event.get('movieList').each {|k, v| event.set(k, v) }
      event.remove('movieList')
    "
 }

The split filter should give you seperate events for the array entries and the ruby code moves the data from the field "movieList" to the root of those events.

Thanks Jenni !!
It now shows different record instead of one big event.

How can i perform pagination while providing Input URL to HTTP Poller
url => "https://someurl/movies?limit=100&offset=0"

Total records are more than 5000, how can i write pagination logic in this .conf file?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.