Hello.
I should like to have the benefit of your advice.
I wanna make a query to an index which have 10 million documents. each document's size is 1k~5k.
I expect the result document count is 10k in minimun and 1Million in maximum.
The query is quite simple. something like..
/myindex/mytype/_search
{
"query": {
"bool": {
"should": {
"query_string": {
"query": "baby car house star giant computer"
}
}
}
}
}
if some common keyword used It's hit count will increase(~1M)
(e.g query:"man morning car house go result")
if some specific keywrd used It's hit count will be small(1k~).
(e.g "offensive knife terror")
My final goal is get all of the each document's _id which has hit.
(but I am not sure I will use min_score)
In that case, if I set large fetch(about 1M) size to get the result at once ES will have OOM trouble.
On the other hands, if I set small fetch size, and use pagination to get the result ES will have deep pagination problem. https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html
What will be the good way to get the large result set by query?
When I use SCAN it doesn't provide score value.
QueryBuilder queryBuilder = QueryBuilders.queryStringQuery(queryString);
SearchRequestBuilder builder = client.prepareSearch("usertext");
builder.setTypes("usertext");
builder.setQuery(queryBuilder);
builder.setSearchType(SearchType.SCAN);
builder.setSize(3000);
builder.setScroll(new TimeValue(1000));
SearchResponse response = builder.execute().actionGet();
int addCount = 0;
while (true) {
for (SearchHit hit : response.getHits()) {
addCount++;
uuidSet.add(hit.getId());
System.out.println(addCount + "," + uuidSet.size() + ", " + hit.getScore());
}
response = client.prepareSearchScroll(response.getScrollId()).setScroll(new meValue(10000)).execute().actionGet();
if (response.getHits().getHits().length == 0) {
break;
}
}