Pattern tokenization to split multiple URL's (edited)


(Phrozyn) #1

I've got a field that is parsing on every non-alphanumeric character, and I'd like to change how it's being parsed to only on comma's.

I've been struggling with how to use the pattern tokenizer.

The entries in this field are generally FQDNs separated by commas like: www.domain.com,blah-blah.domain.com,some.domain.com

My analyzer mapping looks like this:
{"settings":{"analysis":{"analyzer":{"comma":{"type":"pattern","pattern":"\,+"}}}}}

my field mapping looks like this:
{"type":"string", "analysis": {"analyzer":{"comma": {"tokenizer": "pattern"}}}},"os_family":{"type":"string"},

so now I get www.domain.com but blah-blah.domain.com is getting separated at blah and blah

I have no hyphen in my pattern, any ideas?

Thank you.


(system) #2