I am new to this topic of text searching so I have some confusions. I understand that full text data goes through the analyzer that does the following processing:
- Tokenizes the full text into individual words
- Throws away the stop words
- Stems each token
- Updates the lexicon and the inverted index with data relevant for frequency/ranking etc
My question is what happens if I pick a 'keyword' data type where the values have multiple tokens. For instance, say I want to index 'full name' using keyword data type. In this case the values may be like 'Donald Knuth' or 'Ada Lovelace'. In this case what does the lexicon and the inverted index look like? Do we store 'Donald Knuth' and 'Ada Lovelace' in the lexicon (instead of single word tokens)?