What to do with non utf8 characters

its replacing all such characters with question mark.

That is a sign that your text is not UTF8. Make sure your source
documents are what you guessed (latin1?) and that they get properly
converted to UTF8. (I do not understand your converting code. Try to
explain what you want :))

Peter.