Hello there,
Sorry to bother but I'm slowly working my way to fully use and understand the stack and right now I'm blocked with a (I guess) simple problem with Logstash.
I parse multiple events through Logstash and everything is going fine, doing few operations on it and so for. But right now I'm a bit lost when I want to perform a regex on the events I parse.
I receive json events, in those event I have a field 'URL' with a complete url that contains some informations I would like to extract before going to the output.
The entries received look like those:
{
	"id": "AVERYRANDOMID",
	"timeRef": "2022-10-25T10:45:05.000+02:00",
	"url": "https://www.mywebsite.com/the-name-of-my-page-12345678",
	"queryParams": {
		"utm_medium": "email",
		"utm_source": "newsletter",
		"utm_campaign": "MyCampaign"
	},
}
{
	"id": "ANOTHERRANDOMID",
	"timeRef": "2022-10-25T10:45:05.000+02:00",
	"url": "https://www.mywebsite.com/",
	"queryParams": {
	},
}
{
	"id": "INEEDANOTHERONE",
	"timeRef": "2022-10-25T10:45:05.000+02:00",
	"url": "https://www.mywebsite.com/4567890",
	"queryParams": {
	},
}
{
	"id": "LASTONEIPROMISE",
	"timeRef": "2022-10-25T10:45:05.000+02:00",
	"url": "https://www.mywebsite.com/the-name-of-my-page-12345678&fbclid=IwAR2Kf_HojrEdNy",
	"queryParams": {
		"utm_medium": "email",
		"utm_source": "newsletter",
		"utm_campaign": "MyOtherCampaign"
	},
}
At the end I would need, for those 4 entries, to retrieve the ID of the article -if present, from the URL. All rules I found about that ID :
- Always between 6 to 8 numbers
 - Can follow the title of the article/page
 - Can be directly on the root of the website (without the title
 - Can be (ok, will be) followed by extra characters (shared from social network, from utm campaign, etc..)
 - ID can also be absent from the url (because people can sometimes (ahah) use the tools incorrectly)
 
I have the feeling it can be done via logstash and I can avoid some processing after the output, but right now I'm kinda of stuck. And I'm trying to avoid completely Ruby on that one if that's possible.
Did I miss something? If someone could point me to the right direction, would be much appreciated. ![]()