How are keywords indexed

I am new to this topic of text searching so I have some confusions. I understand that full text data goes through the analyzer that does the following processing:

  1. Tokenizes the full text into individual words
  2. Throws away the stop words
  3. Stems each token
  4. Updates the lexicon and the inverted index with data relevant for frequency/ranking etc

My question is what happens if I pick a 'keyword' data type where the values have multiple tokens. For instance, say I want to index 'full name' using keyword data type. In this case the values may be like 'Donald Knuth' or 'Ada Lovelace'. In this case what does the lexicon and the inverted index look like? Do we store 'Donald Knuth' and 'Ada Lovelace' in the lexicon (instead of single word tokens)?

Yes, the full string is stored as a term and is not tokenized.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.