I am trying to extract bitcoin address from eml body that are indexed into elasticsearch.
I am trying to use a scripted field for it with the following script:
if (!doc["emlBody.keyword"].empty) {
def m = /(: )([1,3]{1}[a-zA-Z0-9]{25,34})/.matcher(doc["emlBody.keyword"].value);
if (m.matches()) { return m.group(2) }
else { return "no match" }
}
else { return "NULL"}
However, after checking my data it seems that I rarely enter the first if section, and just get "NULL" results, when i should have at least a "no match" or the correct bitcoin address.
I am a bit lost, just not certain why this doesnt work.
if i use the "emlBody" field after setting fielddata=true i get a more expected results, I get "no match" results instead of "NULL" so it does mean i am entering the if statement, but my regex doesnt seem to pickup.
I believe this regex would be more appropriate for bitcoin address:
(: )([13][a-km-zA-HJ-NP-Z1-9]{25,34})
However, both should work, but are not. Here is the part of a sample email:
Hello,
<br><br>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent sit amet imperdiet elit. Donec sapien orci, rutrum id odio quis, dictum facilisis velit.
<br><br>
Aliquam auctor pulvinar sapien. Ut ut mi fermentum, tempor ex sed, ultricies tellus: 1K8TqsB2C1iY8qdGqhnHfgen3uE8GBU7c8
<br><br>
Aliquam nunc purus, porta non rutrum id, luctus consequat tortor. Maecenas sollicitudin mi vel nisi ultricies, eget blandit mauris feugiat. Fusce et porttitor sem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.