Queries with large character counts in fields

Hi all,

I've run into an issue where searching through a relatively small data set is hitting some pretty slow performance. We're running queries for text matches like "Person Name" against a dataset of around 8GB. The problem is that this data is ingested documents like word, pdf's, or excel docs. Some of the content fields have 16+ million characters in them. Does anybody have advice on how to handle fields with such a large character count?

Any help or advice is much appreciated!
Thanks,
Jason

Hi Jason, thanks for posting your question! Text fields should be tokenized by default which will optimize search performance to the point where 16M characters in a text field shouldn't cause a problem.

Would you mind sharing one of the queries that's slow so we can get a better idea of what's happening? Could you please also share your mapping for the index you're searching?

Thanks,
CJ

Thanks for replying!

Here is the query that is being issued.

Here are the mappings of the index. Some searches will result in query times upwards of 7-10 minutes.

Thanks Jason! I believe the highlighter is the culprit here. Using highlighter with large fields will always incur performance considerations because it has to load the entire text, analyze it, and search it.

Here's a thread from a user with a similar problem as yours: Highlighting takes long time for large documents.

Here are some solutions to consider:

Thanks CJ!

So if I'm understanding this correctly then I'd need to set these options per field within the mappings?

Answered my own question there. Thanks again for the help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.