Standard tokenizer punctuation symbols removed

Thanks for your response @munasia. My initial thought was that the definition for punctuation that gets removed would be there. But I couldn't find it there. Did you? And then upon looking closer at the text I quoted above, it seems to me that the word boundaries are defined by the Unicode Text Segmentation algorithm but the punctuation removal is done separate from that. At least it seems unclear from the wording whether the punctuation removal is part of the Unicode Text Segmentation algorithm or if it's something separate. My guess is that it's separate but I'm not sure, which is why I posted this question :slight_smile: