Need help on logstash parsing

Hi team,

Would wanted to know how logstash will handle this data. I am collecting IOCs or generally called Indicators of Compromise and those are in a form of Domains or URLs. with certain bash commands I am extracting only domain part out of it and then those entries goes into one file.

While doing that it even sort and find the uniq entries and only unique entries goes into final file. Now I need to push those entries along with few others in ES database.

Hence wanted to know if logstash can handle the deduplication? Or since I am using sort command wanted to know if only new entries will get added in ES database?

Well here are the RAW entries I am getting and after sanitizing the below output I am pushing in text file

http://albonoor.com/templates/beez3/www.bankofamerica.com%20(1)/www.bankofamerica.com/e0243071f605de9f9ae977bc93e3ccec/confirm.php?cmd=_account-details&session=bfd1df6bd17624b22d5b1348583fed31&dispatch=d471145b3dcf5fd973cf1953dc7baff2ed3d6b94
http://albonoor.com/templates/beez3/www.bankofamerica.com%20(1)/www.bankofamerica.com/e0243071f605de9f9ae977bc93e3ccec/emails.php?cmd=_account-details&session=abd16eafcdf731efe27ba3920311a25c&dispatch=9f968766289176b9afdbd38ca9caac78317faef5
http://peregrinosdequeretaroaltepeyac.org/.well-known/Stripe/verification/978770D232D74D35EM8M/index.php?country.x=-&lang.x=en
http://peregrinosdequeretaroaltepeyac.org/.well-known/Stripe/verification/978770D232D74D35EM8M/acc_ver.php
https://protiensheiks.cf/rl/newhotmail/Validation/login2.php?https://login.live.com/public/IdentifyUser.aspx?LOB=RBGLogon
https://protiensheiks.cf/rl/newhotmail/Validation/login.php?https://login.live.com/public/IdentifyUser.aspx?LOB=RBGLogon
http://wf1-wfl.site/mob/Login.php?sessionid=cSo7oWL2blq7oh6szRVBmDa80BbeA1qNUOUiLFlW1L4p3aSC1Oeorows0IGAJ6nDViWGYhDZ3Io6TgJU4XjwlQYlyEWiLkWGCSnAa1AdJYkCf3xj0RQmHPIguFyfZuWCnj&securessl=true
http://wf1-wfl.site/mob/VerifyCard.php?gBcPGsLrL16pST0caGcp4It0jI8Uh2fyDrnkU9LGaS52M6eWMrlR9OStx0oO3DmH4K2ZTNF3GL5sRkoELKvVzoo6oMVspi9u2buVZ9YFU48MoxqahW5QkuXIgSaGakbdvF&securessl=true
tr6mdtsp.biz
tradebits.top
traeaf.club;1
trimenex.us

Final entries would look like

albonoor.com
login.live.com
protiensheiks.cf
peregrinosdequeretaroaltepeyac.org
protiensheiks.cf
tr6mdtsp.biz
tradebits.top
traeaf.club;1
trimenex.us

TIA
Blason R

in terms of deduplication, i would be curious to see the answer as well. as per the cleaning up entries, maybe try the dissect filter?

Well I believe elastic search stack by default does not do dedup.

OK- Guys this is solved by GROK filter.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.