Hello there,
Sorry to bother but I'm slowly working my way to fully use and understand the stack and right now I'm blocked with a (I guess) simple problem with Logstash.
I parse multiple events through Logstash and everything is going fine, doing few operations on it and so for. But right now I'm a bit lost when I want to perform a regex on the events I parse.
I receive json events, in those event I have a field 'URL' with a complete url that contains some informations I would like to extract before going to the output.
The entries received look like those:
{
"id": "AVERYRANDOMID",
"timeRef": "2022-10-25T10:45:05.000+02:00",
"url": "https://www.mywebsite.com/the-name-of-my-page-12345678",
"queryParams": {
"utm_medium": "email",
"utm_source": "newsletter",
"utm_campaign": "MyCampaign"
},
}
{
"id": "ANOTHERRANDOMID",
"timeRef": "2022-10-25T10:45:05.000+02:00",
"url": "https://www.mywebsite.com/",
"queryParams": {
},
}
{
"id": "INEEDANOTHERONE",
"timeRef": "2022-10-25T10:45:05.000+02:00",
"url": "https://www.mywebsite.com/4567890",
"queryParams": {
},
}
{
"id": "LASTONEIPROMISE",
"timeRef": "2022-10-25T10:45:05.000+02:00",
"url": "https://www.mywebsite.com/the-name-of-my-page-12345678&fbclid=IwAR2Kf_HojrEdNy",
"queryParams": {
"utm_medium": "email",
"utm_source": "newsletter",
"utm_campaign": "MyOtherCampaign"
},
}
At the end I would need, for those 4 entries, to retrieve the ID of the article -if present, from the URL. All rules I found about that ID :
- Always between 6 to 8 numbers
- Can follow the title of the article/page
- Can be directly on the root of the website (without the title
- Can be (ok, will be) followed by extra characters (shared from social network, from utm campaign, etc..)
- ID can also be absent from the url (because people can sometimes (ahah) use the tools incorrectly)
I have the feeling it can be done via logstash and I can avoid some processing after the output, but right now I'm kinda of stuck. And I'm trying to avoid completely Ruby on that one if that's possible.
Did I miss something? If someone could point me to the right direction, would be much appreciated.