Hi,
I have following data with a list of special keywords (e.g.EAST, WEST). These special keywords are associated with numbers. Therefore regular tokenizer doesn't produce relevant tokens since I want bigrams (word and value) for these special keywords only. For rest of words i want regular tokens.
Note: there is semicolon after EAST numbers and WEST which is delimiter for values of given keyword
Any advice how this can be achieved?
Data - EAST 1,2,3,4;WEST 4,9;Queen Street north tower
Tokens
EAST 1
EAST 2
EAST 3
EAST 4
WEST 4
WEST 9
Queen
Street
north
tower