Possible to query by token type?


(Ed Howe) #1

Suppose you use the UAX29URLEmailTokenizer for indexing a field. It generates tokens of type and in addition to and (and some others). Is there any way to, say, query for all documents that contain an token? Sure, I could use a regex query to find tokens that look like emails, but I want to only consider tokens.


(Ed Howe) #2

The only thing I've been able to find is a lucene question from ten years ago at http://www.gossamer-threads.com/lists/lucene/java-user/14475. It basically indicates that only the terms are in the index, so assuming that hasn't changed, the answer to my question is "no." I've modified my tokenizer to prepend a special string to the terms, indicating the token type. It is then trivial to use a wildcard search on that special string.


(system) #3