Got a question regarding the pattern analyzer. Example text:
It's encrypted (each letter is hashed to 20 characters ending with
== sign). I would like to create tokenizer for each letter, tokens should be:
S6UlZgYCJaSIQcy03OOA== Ieuwc7Ix/CQfwoDSOVJl== 2oZjflRSRkcj4/OHcp78==
If I put
"pattern": "==" it will create token without
== sign (e.g.
S6UlZgYCJaSIQcy03OOA). Is there any way to also include a separator as part of the token, or maybe some another logic like "take 20 characters [ skip whitespace ] and create 1 token, then take another 20 characters [ skip whitespace ] and create 2nd token, etc?
The rules are quite simple
- 1 letter = 20 characters
- Every hashed letter ends with
- Whitespace is also a separator, if the new word starts, it will be separated by whitespace.
It would be cool if I can combine just pattern to be included in the token, and the whitespace analyzer together.