Sorry, I must have confused myself. Offsets are utf-16 in both cases and they seem to be reported correctly regardless of analysis chain. I'm not sure how I got it in my head that the offsets were utf-8.
Sorry, I must have confused myself. Offsets are utf-16 in both cases and they seem to be reported correctly regardless of analysis chain. I'm not sure how I got it in my head that the offsets were utf-8.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.