Got a question regarding the pattern analyzer. Example text:
S6UlZgYCJaSIQcy03OOA==Ieuwc7Ix/CQfwoDSOVJl== 2oZjflRSRkcj4/OHcp78==
It's encrypted (each letter is hashed to 20 characters ending with ==
sign). I would like to create tokenizer for each letter, tokens should be:
S6UlZgYCJaSIQcy03OOA==
Ieuwc7Ix/CQfwoDSOVJl==
2oZjflRSRkcj4/OHcp78==
If I put "pattern": "=="
it will create token without ==
sign (e.g. S6UlZgYCJaSIQcy03OOA
). Is there any way to also include a separator as part of the token, or maybe some another logic like "take 20 characters [ skip whitespace ] and create 1 token, then take another 20 characters [ skip whitespace ] and create 2nd token, etc?
The rules are quite simple
- 1 letter = 20 characters
- Every hashed letter ends with
==
sign - Whitespace is also a separator, if the new word starts, it will be separated by whitespace.
It would be cool if I can combine just pattern to be included in the token, and the whitespace analyzer together.