Bigrams for selected keywords


(Manvir Basra) #1

Hi,

I have following data with a list of special keywords (e.g.EAST, WEST). These special keywords are associated with numbers. Therefore regular tokenizer doesn't produce relevant tokens since I want bigrams (word and value) for these special keywords only. For rest of words i want regular tokens.

Note: there is semicolon after EAST numbers and WEST which is delimiter for values of given keyword

Any advice how this can be achieved?

Data - EAST 1,2,3,4;WEST 4,9;Queen Street north tower

Tokens
EAST 1
EAST 2
EAST 3
EAST 4
WEST 4
WEST 9
Queen
Street
north
tower


(Mark Walkom) #2

I think it would be easier to do this before pushing it into Elasticsearch if you can.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.