How to force queries for very high frequency words


(FAGIM SADYKOV) #1

Problem:

  1. There is very frequent word for index for example: "company" ( 6mln over 8mln total documents)
  2. It's not stop word and is indexed
  3. It's very often that users make their request with this only word and it's legal request
  4. It executes very and very slow ~1minute while any of usual requests ~300-500ms
  5. In terms aggregation I see total for "company" very quick - so counting "total" is not problem for ES
  6. Any shard contains many and many of documents with "company" word - so - get size:10 not problem

I think that problem is that it trys to score and sort such query (?)

Is there any solution to setup query to force total count and no-scored return for such queries?


(David Pilato) #2

If you use a bool -> filter clause then inner queries won't be scored


(FAGIM SADYKOV) #3
{
	"size":10,
	"query":{
    "bool": { 
      "filter":{
				"match": {"name":"company"}
			}
    }
	}
} 

same speed
it frequent word became the only word of search query - it's dramatically slows down.
Version of ElasticSearch 5.5, modification - Elassandra - uses cassandra as doc storage instead of Lucene.
Tryed to launch on ElasticSearch in usual edition - same thing.


(FAGIM SADYKOV) #4

Is it's possible that while ShardCount in Elassandra == nodes count of Cassandra that it uses - it's too less shards for optimal query execution?


(David Pilato) #5

Version of ElasticSearch 5.5, modification - Elassandra - uses cassandra as doc storage instead of Lucene.

That's not correct. Still using Lucene but not the local File System.

Could you try similar test on 6.2.1? I think there have been some improvements recently.
Then share more details (when you don't test with Elassandra as we can't support it).

Is it's possible that while ShardCount in Elassandra == nodes count of Cassandra that it uses - it's too less shards for optimal query execution?

I don't know but please run your tests without Elassandra or ask on the Elassandra forums where you get a better chance to get help about their product.


(ddorian43) #6

Note elassandra still using local File System.
It has 1 shard-per-keyspace in "default scenario", so each search will be single-threaded in that node. Have you tried adding "sort: _doc" ? And when you run the query again it's not cached (the one with filter) ?


(David Pilato) #7

My bad @ddorian43. I forgot about this.


(ddorian43) #8

I meant lucene-files are in local filesystem, so ~nearly the same except each query has a token-filter (since data is split into tokens on cassandra)


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.