Comparing Large Text Documents -- Queries with Large Text Fields


(Shawn O'Banion) #1

Hello,

I'm interested in using ElasticSearch to compare large text documents. So,
essentially, I want to index documents that contain a large-ish 'text'
field (on average, 25,000+ characters or 3,500 words). I then want to
search this index using a query parameter on this same field with a value
of a similar size.

Is this an appropriate use-case for ElasticSearch? If so, what type of
query (e.g. match, query_string) would you recommend using? At first
glance, a match query seems best since it won't "parse" the query string.

Just playing around, I tried the match query (with the default "or"
operator) and received this error:

pyes.exceptions.ElasticSearchException: TooManyClauses[maxClauseCount is
set to 1024];

Is it simply a matter of increasing the maxClauseCount to support my
requirements?

Thanks for your help!
Shawn

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Otis Gospodnetić) #2

Hi Shawn,

If you are OK with such queries not being fast and not being cheap, you can
up the limits and this should be OK.

http://search-lucene.com/?q="maxClauseCount"&fc_project=ElasticSearch&fc_type=mail+hash+user

Otis

Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm

On Thursday, September 26, 2013 4:08:45 PM UTC-4, Shawn O'Banion wrote:

Hello,

I'm interested in using ElasticSearch to compare large text documents. So,
essentially, I want to index documents that contain a large-ish 'text'
field (on average, 25,000+ characters or 3,500 words). I then want to
search this index using a query parameter on this same field with a value
of a similar size.

Is this an appropriate use-case for ElasticSearch? If so, what type of
query (e.g. match, query_string) would you recommend using? At first
glance, a match query seems best since it won't "parse" the query string.

Just playing around, I tried the match query (with the default "or"
operator) and received this error:

pyes.exceptions.ElasticSearchException: TooManyClauses[maxClauseCount is
set to 1024];

Is it simply a matter of increasing the maxClauseCount to support my
requirements?

Thanks for your help!
Shawn

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3