using a regular expression is not an efficient way to solve this. What if you decided on indexing, if there is an URL in your field and extracted that URL or its parts out of that? This way your query could become much simpler, maybe just for the existence of a field or a query against the deconstructed parts of an URL.
This however requires some preprocessing work (a classic search tradeoff, you need to decide if you spent CPU cycles on indexing and preprocessing or on search time).
You could use an self written ingest processor to extract URLs from text or just have a preprocessing python script.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.