GitLab API Pagination


#1

Hello guys,

I'm trying to get some global information about GitLab projects from the API. The problem is that this API is limited to 100 projects per page. I've managed to retrieve the number of pages with the following command:
echo $(curl --head "https://gitlab.xxxxxxx/api/v4/projects?per_page=100" | grep X-Total-Pages | grep -o -e [0-9]*)
But I don't really know how to use it inside my Logstash input, which is a curl command within exec input.
I've tried this curl --head "https://gitlab.xxxxxxx/api/v4/projects?per_page=100&page=[1-6] but it gives a single message output as an array. I can split this array, which gives me a document for each project, and that's actually close to what I want. The problem is the message field in these documents consists of all the JSON data for each project; it doesn't create a field for each information.
Also, I had to put the codec back to plain, instead of JSON, for the split filter to work. I was getting a JSON parsing error.

If I can make this work I think I'll be able to replace the 6 in page=[1-6] with the first command. I'd really like to be able to make it automatic, without having to change the max page when there are more projects.

Another way to do this might be using a script that would execute a curl for each existing page, looping until the number of pages, which would be stored in a variable. I don't really know how/if scripts work in Logstash.

If you have any idea that could help, thank you. If you know I can't do that automatically please tell me :wink:


#2

I still haven't found a way around this so any help is appreciated :slight_smile:


#3

I'm back on this problem. I have made the following script, but it doesn't work as intended; it only outputs the first curl result.

nb_gitlab_pages=`curl --head -H "Private-Token: mytoken" "https://gitlabhost/api/v4/projects?per_page=100" | grep X-Total-Pages | grep -o -e [0-9]*`
for i in `seq 1 $nb_gitlab_pages`
do
  	curl -H "Private-Token: mytoken" "https://gitlabhost/api/v4/projects?statistics=true&per_page=100&page=$i"
done
  • Why does Logstash only send the first event (or page, which is actually 100 projects) to Elasticsearch?

Edit: It actually sends more than one page/event if there are less projects per page.

Then, I can split the message into 100 events using a custom terminator (as it is not recognized as JSON). The problem is I have to split the message in each event again to have all my fields instead of a single string in the message field. It would be easy if it had a nice JSON format but I kinda broke it because of my terminator (something like "}}").
Edit: the separator between pages or projects is "}}".I mean that this is the shortest string I can use to know the JSON for a project ends and another will begin.

  • What would be the proper way to have all my events and all my fields?

Thanks for the help.


#4

What does an event look like? Use output { stdout { codec => rubydebug } }


#5

If I use codec => json in the input, the ouput is exactly as I want it to be but it only processes the first page.
If I don't use codec => json (it's plain by default IIRC), here's the output, with 2 projects per page and 2 pages, so you can see the different separators:
Full output as an image:


(there were too many characters for a single post)

Output without useless information:

{
    "@timestamp" => 2018-07-18T07:06:52.468Z,
       "command" => "~/Downloads/logstash-6.2.4/config/test.sh",
          "host" => "aurelien-pc",
       "message" => "[{\"id\":576,\"description\":\"\",\"name\":\"user-club-gitlab-1\",\"name_with_namespace\":\"user-club / user-club-gitlab-1\",\"path\":\"user-club-gitlab-1\",\"path_with_namespace\":\"user-club/user-club-gitlab-1\",\"created_at\":\"2018-07-17T13:49:53.649Z\",\"default_branch\":\"master\",\"tag_list\":[],\"ssh_url_to_repo\":\"git@gitlabhost:user-club/user-club-gitlab-1.git\",\"http_url_to_repo\":\"https://gitlabhost/user-club/user-club-gitlab-1.git\",\"web_url\":\"https://gitlabhost/user-club/user-club-gitlab-1\",\"avatar_url\":null,\"star_count\":0,\"forks_count\":0,\"last_activity_at\":\"2018-07-17T16:07:55.168Z\",\"statistics\":{\"commit_count\":1,\"storage_size\":471859,\"repository_size\":471859,\"lfs_objects_size\":0,\"job_artifacts_size\":0},\"permissions\":{\"project_access\":null,\"group_access\":{\"access_level\":50,\"notification_level\":3}}},{\"id\":575,\"description\":\"\",\"name\":\"ayi-gitlab-1\",\"name_with_namespace\":\"ayi / ayi-gitlab-1\",\"path\":\"ayi-gitlab-1\",\"path_with_namespace\":\"ayi/ayi-gitlab-1\",\"created_at\":\"2018-07-16T13:08:26.280Z\",\"default_branch\":null,\"tag_list\":[],\"ssh_url_to_repo\":\"git@gitlabhost:ayi/ayi-gitlab-1.git\",\"http_url_to_repo\":\"https://gitlabhost/ayi/ayi-gitlab-1.git\",\"web_url\":\"https://gitlabhost/ayi/ayi-gitlab-1\",\"avatar_url\":null,\"star_count\":0,\"forks_count\":0,\"last_activity_at\":\"2018-07-16T13:08:26.280Z\",\"statistics\":{\"commit_count\":0,\"storage_size\":0,\"repository_size\":0,\"lfs_objects_size\":0,\"job_artifacts_size\":0},\"permissions\":{\"project_access\":null,\"group_access\":{\"access_level\":50,\"notification_level\":3}}}][{\"id\":574,\"description\":\"\",\"name\":\"test-rle\",\"name_with_namespace\":\"test / test-rle\",\"path\":\"test-rle\",\"path_with_namespace\":\"test/test-rle\",\"created_at\":\"2018-07-10T10:46:14.393Z\",\"default_branch\":null,\"tag_list\":[],\"ssh_url_to_repo\":\"git@gitlabhost:A137990/test-rle.git\",\"http_url_to_repo\":\"https://gitlabhost/A137990/test-rle.git\",\"web_url\":\"https://gitlabhost/A137990/test-rle\",\"avatar_url\":null,\"star_count\":0,\"forks_count\":0,\"last_activity_at\":\"2018-07-10T10:46:14.393Z\",\"statistics\":{\"commit_count\":0,\"storage_size\":0,\"repository_size\":0,\"lfs_objects_size\":0,\"job_artifacts_size\":0},\"permissions\":{\"project_access\":null,\"group_access\":null}},{\"id\":573,\"description\":\"\",\"name\":\"myproject-gitlab-2\",\"name_with_namespace\":\"myproject / myproject-gitlab-2\",\"path\":\"myproject-gitlab-2\",\"path_with_namespace\":\"myproject/myproject-gitlab-2\",\"created_at\":\"2018-07-09T07:47:18.812Z\",\"default_branch\":\"master\",\"tag_list\":[],\"ssh_url_to_repo\":\"git@gitlabhost:myproject/myproject-gitlab-2.git\",\"http_url_to_repo\":\"https://gitlabhost/myproject/myproject-gitlab-2.git\",\"web_url\":\"https://gitlabhost/myproject/myproject-gitlab-2\",\"avatar_url\":null,\"star_count\":0,\"forks_count\":0,\"last_activity_at\":\"2018-07-10T15:35:40.437Z\",\"statistics\":{\"commit_count\":5,\"storage_size\":75303547,\"repository_size\":72446115,\"lfs_objects_size\":0,\"job_artifacts_size\":3271789},\"permissions\":{\"project_access\":null,\"group_access\":{\"access_level\":50,\"notification_level\":3}}}]",
      "@version" => "1"
}

(it's easier to manipulate as text than as an image)

The message is a single line, which is quite normal with the plain codec I think. I'm not sure using split on this plain message is the best idea but I don't know what else I can do.


#6

What if I use gsub to add something like "\n" to have a common separator for every project, then split using that terminator and finally make it into a json (the format should be ok if I don't mess up the gsub). Is it possible to specify the json in a filter, after editing the event?


#7

I am not clear on the problem. A json filter handles that JSON just fine.

input { generator { count => 1 message => '[{"id":576,"description":"","name":"user-club-gitlab-1","name_with_namespace":"user-club / user-club-gitlab-1","path":"user-club-gitlab-1","path_with_namespace":"user-club/user-club-gitlab-1","created_at":"2018-07-17T13:49:53.649Z","default_branch":"master","tag_list":[],"ssh_url_to_repo":"git@gitlabhost:user-club/user-club-gitlab-1.git","http_url_to_repo":"https://gitlabhost/user-club/user-club-gitlab-1.git","web_url":"https://gitlabhost/user-club/user-club-gitlab-1","avatar_url":null,"star_count":0,"forks_count":0,"last_activity_at":"2018-07-17T16:07:55.168Z","statistics":{"commit_count":1,"storage_size":471859,"repository_size":471859,"lfs_objects_size":0,"job_artifacts_size":0},"permissions":{"project_access":null,"group_access":{"access_level":50,"notification_level":3}}},{"id":575,"description":"","name":"ayi-gitlab-1","name_with_namespace":"ayi / ayi-gitlab-1","path":"ayi-gitlab-1","path_with_namespace":"ayi/ayi-gitlab-1","created_at":"2018-07-16T13:08:26.280Z","default_branch":null,"tag_list":[],"ssh_url_to_repo":"git@gitlabhost:ayi/ayi-gitlab-1.git","http_url_to_repo":"https://gitlabhost/ayi/ayi-gitlab-1.git","web_url":"https://gitlabhost/ayi/ayi-gitlab-1","avatar_url":null,"star_count":0,"forks_count":0,"last_activity_at":"2018-07-16T13:08:26.280Z","statistics":{"commit_count":0,"storage_size":0,"repository_size":0,"lfs_objects_size":0,"job_artifacts_size":0},"permissions":{"project_access":null,"group_access":{"access_level":50,"notification_level":3}}}][{"id":574,"description":"","name":"test-rle","name_with_namespace":"test / test-rle","path":"test-rle","path_with_namespace":"test/test-rle","created_at":"2018-07-10T10:46:14.393Z","default_branch":null,"tag_list":[],"ssh_url_to_repo":"git@gitlabhost:A137990/test-rle.git","http_url_to_repo":"https://gitlabhost/A137990/test-rle.git","web_url":"https://gitlabhost/A137990/test-rle","avatar_url":null,"star_count":0,"forks_count":0,"last_activity_at":"2018-07-10T10:46:14.393Z","statistics":{"commit_count":0,"storage_size":0,"repository_size":0,"lfs_objects_size":0,"job_artifacts_size":0},"permissions":{"project_access":null,"group_access":null}},{"id":573,"description":"","name":"myproject-gitlab-2","name_with_namespace":"myproject / myproject-gitlab-2","path":"myproject-gitlab-2","path_with_namespace":"myproject/myproject-gitlab-2","created_at":"2018-07-09T07:47:18.812Z","default_branch":"master","tag_list":[],"ssh_url_to_repo":"git@gitlabhost:myproject/myproject-gitlab-2.git","http_url_to_repo":"https://gitlabhost/myproject/myproject-gitlab-2.git","web_url":"https://gitlabhost/myproject/myproject-gitlab-2","avatar_url":null,"star_count":0,"forks_count":0,"last_activity_at":"2018-07-10T15:35:40.437Z","statistics":{"commit_count":5,"storage_size":75303547,"repository_size":72446115,"lfs_objects_size":0,"job_artifacts_size":3271789},"permissions":{"project_access":null,"group_access":{"access_level":50,"notification_level":3}}}]' } }

filter { json { source => "message" target => "foo" } }

output { stdout { codec => rubydebug } }

#8

I forgot about the json filter, I thought the json codec in the input would do the same job.
I can't test right now but I will tell you if that's what I want tomorrow morning.
So this configuration gives you 4 separate events? I can't believe it's that simple :smile:


#9

Well you need to add this to get multiple events

split { field => "foo" }

#10

Okay thanks, I'll try that!
Do you have any idea why codec => json in the input doesn't work and can't process the whole message but only the first page of projects?
I would say the separator between pages makes it fail to recognize the JSON, but then I don't understand why the JSON filter works...
What is the actual difference between the codec and the filter? They both apply to the same field here.


#11

This was too good to be true: unfortunately, it doesn't work. :frowning_face:
If I only use the JSON filter, I get a weird output in my target field. I have some of the JSON fields but not all, they are not in the same order as in the raw output, and there are actually two projects mixed in that single output. Also, I get a warning "Objects in arrays are not well supported".
Then, if I add the split filter, I think i have all the fields, but it only gives me 2 documents, so 2 projects.

I will try using gsub to modify the initial output so there is no more difference between pages separator and projects separator.


#12

I did it!
I used the following filter first:

mutate {
    gsub => [
      "message", "\]\[", ","
    ]
  }

Then your JSON filter, the split filter and it works!

Although I now have another problem, I had to move the script and now I think Logstash can't execute it (the output is a blank message). It doesn't seem to be a problem with rights but rather with the path: if I use "/home/user/myscript.sh" it doesn't work, but if I use "~/myscript/sh" it works.
Those two paths are clearly the same so how do I handle a path starting with a slash?

Edit: It seems to be working if I escape only the first slash with a blackslash. But it doesn't always work, I really don't get it...


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.