if I index doc as in the image, nor hello or the world is searchable.
though had it been really   I could have used html_strip char filter to remove it.
Note: the tokens generated for the above text are still hello and the world (standard tokenizer)