How to force queries for very high frequency words

comdiv · February 9, 2018, 7:11am

Problem:

There is very frequent word for index for example: "company" ( 6mln over 8mln total documents)
It's not stop word and is indexed
It's very often that users make their request with this only word and it's legal request
It executes very and very slow ~1minute while any of usual requests ~300-500ms
In terms aggregation I see total for "company" very quick - so counting "total" is not problem for ES
Any shard contains many and many of documents with "company" word - so - get size:10 not problem

I think that problem is that it trys to score and sort such query (?)

Is there any solution to setup query to force total count and no-scored return for such queries?

dadoonet · February 9, 2018, 7:29am

If you use a bool -> filter clause then inner queries won't be scored

comdiv · February 9, 2018, 7:36am

{
	"size":10,
	"query":{
    "bool": { 
      "filter":{
				"match": {"name":"company"}
			}
    }
	}
}

same speed
it frequent word became the only word of search query - it's dramatically slows down.
Version of ElasticSearch 5.5, modification - Elassandra - uses cassandra as doc storage instead of Lucene.
Tryed to launch on ElasticSearch in usual edition - same thing.

comdiv · February 9, 2018, 7:40am

Is it's possible that while ShardCount in Elassandra == nodes count of Cassandra that it uses - it's too less shards for optimal query execution?

dadoonet · February 9, 2018, 7:52am

Version of Elasticsearch 5.5, modification - Elassandra - uses cassandra as doc storage instead of Lucene.

That's not correct. Still using Lucene but not the local File System.

Could you try similar test on 6.2.1? I think there have been some improvements recently.
Then share more details (when you don't test with Elassandra as we can't support it).

Is it's possible that while ShardCount in Elassandra == nodes count of Cassandra that it uses - it's too less shards for optimal query execution?

I don't know but please run your tests without Elassandra or ask on the Elassandra forums where you get a better chance to get help about their product.

ddorian43 · February 9, 2018, 9:55am

Note elassandra still using local File System.
It has 1 shard-per-keyspace in "default scenario", so each search will be single-threaded in that node. Have you tried adding "sort: _doc" ? And when you run the query again it's not cached (the one with filter) ?

dadoonet · February 9, 2018, 2:06pm

My bad @ddorian43. I forgot about this.

ddorian43 · February 9, 2018, 2:21pm

I meant lucene-files are in local filesystem, so ~nearly the same except each query has a token-filter (since data is split into tokens on cassandra)

system · March 9, 2018, 2:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Limit a document frequency in elasticsearch Elasticsearch	1	441	December 15, 2016
Does anyone know of a way to get elasticsearch to return a word count? Elasticsearch	4	751	July 6, 2017
Proper handling query with same words Elasticsearch	3	52	July 24, 2025
How to achieve Query Performance Elasticsearch	11	739	July 6, 2017
Improve query time Elasticsearch	16	2165	July 6, 2017

How to force queries for very high frequency words

Related topics