I used term query to search some entries and the search result contains 4714 entries, but it cost 7 min.
My index setting and mapping look like:
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer",
"filter": [
"length_filter"
]
}
},
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": "(?<=[EW])(?!R)",
"lowercase": false
}
},
"filter": {
"length_filter": {
"type": "length",
"max": 70
}
}
}
},
"mappings": {
"_doc": {
"dynamic": "strict",
"properties": {
"id": {
"type": "keyword"
},
"sequence1": {
"type": "text",
"analyzer": "my_analyzer"
},
"sequence2": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
the doc that needs to been inserted looks like:
{
"id": "001",
"sequence1": "AAAAAAAAAAAAABBBBBBBBBBBBBBBBBBDDDDDDDDDDDDDDDDDEGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGW",
"sequence2": "AAAAAAAAAAAAABBBBBBBBBBBBBBBBBBDDDDDDDDDDDDDDDDDEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGWWJHSACSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSYBIUIBBSAHBSJB AAAAAAAAAAAAAAAAAIOJIONMBJKSSSSSSSSSSSSSSSJKNJKNJUHIOHJKJTIBBFAFOBS DHUAGABSBIGUIGBHJBKNIOHIUHHHIOHIUGVAOAPPPPPPP"
}
my java search code is
SearchResponse response = esClient.prepareSearch("my_index")
.addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60_000))
.setQuery(QueryBuilders.matchPhraseQuery("sequence1, "GGGGGGGGGGGGGGGW"))
.setFetchSource(null, new String[]{"sequence1", "sequence2"})
.setSize(100)
.get();
do {
for (SearchHit hit : response.getHits().getHits()) {
// there is nothing
}
response = esClient.prepareSearchScroll(response.getScrollId())
.setScroll(new TimeValue(60_000))
.get();
} while (response.getHits().getHits().length != 0);
I create a line chart to check the time consumed per ** scroll**. The x-axis of the chart is the number of scroll (0 is the first time running scroll). The y-axis is the time consumed.
The elasticsearch version is 6.6.1 and use the default settings. I only modified jvm.options,
-Xms4g
-Xmx4g
The index size is 74.03GB and has 66,006,373 docs. (one shard and no replica)