Custom Tokenizer - preserving previous tokens


(Chris Marino) #1

Does anyone know if it's possible to tokenize the following string as such?

"myField": "123-4-5678-9012-3"

I want my tokens to be:

123
123-4
123-4-5678
123-4-5678-9012
123-4-5678-9012-3

I've been playing around w the EdgeNGram and pattern tokenizers but they don't do quite what I want.

I guess my other option is to store the variations in the source prior to ingestion.


(Christian Dahlqvist) #2

Have you tried the path hierarchy tokenizer?


(Chris Marino) #3

THATS IT!!

Thanks Christian


(system) #4