Bigrams for selected keywords

Hi,

I have following data with a list of special keywords (e.g.EAST, WEST). These special keywords are associated with numbers. Therefore regular tokenizer doesn't produce relevant tokens since I want bigrams (word and value) for these special keywords only. For rest of words i want regular tokens.

Note: there is semicolon after EAST numbers and WEST which is delimiter for values of given keyword

Any advice how this can be achieved?

Data - EAST 1,2,3,4;WEST 4,9;Queen Street north tower

Tokens
EAST 1
EAST 2
EAST 3
EAST 4
WEST 4
WEST 9
Queen
Street
north
tower

I think it would be easier to do this before pushing it into Elasticsearch if you can.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.