Need to analyse email addresses


(somerandomguy) #1

Hi. I need to analyse email addresses because right now when they come in and I create a visualise i get 1 field with john.doe and then another with @domain.com.
I created a new index with this:

https://www.elastic.co/guide/en/elasticsearch/reference/2.3/analysis-pattern-capture-tokenfilter.html
And then I sent some emails into through my Logstash input/output but Kibana is still not letting me analyse them. Actually in the left hand pane under message it only has the word 'm'?

Are there any gotchas that I need to look out for?


(Christoph) #2

Hi,

Have you also tried using the email-tokenizer? Also for debugging purposes I'd suggest to take logstash out of the picture, sent some docs manually to ES (e.g. via curl) and see if search works as expected on those email fields. Maybe its also logstash doing the tokenizing of the email adresses.


(somerandomguy) #3

Hi.
Do you have an example template that I can upload to ES for this to work?


(somerandomguy) #4

So I have uploaded this email tokenizer to my /incoming index.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/analysis-pattern-capture-tokenfilter.html

And did a dummy import of a few email addresses but still when I try to do a visualisation it separates out user id from the domain name..


(somerandomguy) #5

Created an index with this and now when I create a visualisation email addresses they are combined with the userid@domain.name instead of being broken out by user and domain name

https://github.com/imotov/elasticsearch-test-scripts/blob/master/email_default_analyzer.sh.


(system) #6