Text with &nbsp not searchable

if I index doc as in the image, nor hello or the world is searchable.
though had it been really &nbsp I could have used html_strip char filter to remove it.

Note: the tokens generated for the above text are still hello and the world (standard tokenizer)

