How to ignore URL when searching using ElasticSearch?

Hi,I have a set of documents which may contains some texts, but may have URLs inside them:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam tincidunt metus a convallis imperdiet. Praesent interdum magna ut lorem bibendum vehicula. Maecenas consectetur tortor a ex pulvinar, sit amet sollicitudin nunc maximus. Pellentesque non gravida ligula, imperdiet pharetra odio. Nunc non massa vitae mauris tempor tempus. Nulla ac laoreet tellus. Nulla consequat tortor eu eros euismod bibendum. Curabitur ante ligula, aliquet at lacus at, pretium convallis eros. Fusce id mi condimentum, tempor lorem ut, pharetra libero.

https://document.io/document/ipsum

In eget eleifend neque. Morbi ex leo, tincidunt non enim ut, rutrum suscipit metus. Cras laoreet ex ut massa consequat condimentum. Aenean finibus eu nisl ut rhoncus. Aliquam finibus nisl risus, id facilisis justo rutrum et. Aenean enim libero, commodo id mi ut, mattis sollicitudin tellus. Aliquam molestie ligula sit amet lorem malesuada, aliquet pretium dolor malesuada. Phasellus fringilla libero in sollicitudin tristique. Quisque molestie, enim et aliquam dapibus, ex erat ultrices nisi, luctus ornare lorem metus eu sapien.

I am using a match query to search words inside the document, however, as you can see sometimes the URL has words that are also part of the actual texts. This is messing the result up. I am just wondering if ElasticSearch has a way for me to simply ignore the URLs and just focus on the texts?

I am using english analyzer for this field at this moment.

Take a look at the uax url email tokenizer. which leaves URLs as a single token and thus you cannot search for terms that are part of a URL.

Hope that helps!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.