How to optimize the term query?

huangjs · April 24, 2019, 3:01pm

I used term query to search some entries and the search result contains 4714 entries, but it cost 7 min.
My index setting and mapping look like:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_tokenizer",
          "filter": [
            "length_filter"
          ]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": "(?<=[EW])(?!R)",
          "lowercase": false
        }
      },
      "filter": {
        "length_filter": {
          "type": "length",
          "max": 70
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "dynamic": "strict",
      "properties": {
        "id": {
          "type": "keyword"
        },
        "sequence1": {
          "type": "text",
          "analyzer": "my_analyzer"
        },
        "sequence2": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

the doc that needs to been inserted looks like:

{
    "id": "001",
    "sequence1": "AAAAAAAAAAAAABBBBBBBBBBBBBBBBBBDDDDDDDDDDDDDDDDDEGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGW",
    "sequence2": "AAAAAAAAAAAAABBBBBBBBBBBBBBBBBBDDDDDDDDDDDDDDDDDEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGWWJHSACSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSYBIUIBBSAHBSJB AAAAAAAAAAAAAAAAAIOJIONMBJKSSSSSSSSSSSSSSSJKNJKNJUHIOHJKJTIBBFAFOBS DHUAGABSBIGUIGBHJBKNIOHIUHHHIOHIUGVAOAPPPPPPP"
}

my java search code is

SearchResponse response = esClient.prepareSearch("my_index")
                .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
                .setScroll(new TimeValue(60_000))
                .setQuery(QueryBuilders.matchPhraseQuery("sequence1, "GGGGGGGGGGGGGGGW"))
                .setFetchSource(null, new String[]{"sequence1", "sequence2"})
                .setSize(100)
                .get();
        do {
            for (SearchHit hit : response.getHits().getHits()) {
                  // there is nothing
            }
            response = esClient.prepareSearchScroll(response.getScrollId())
                    .setScroll(new TimeValue(60_000))
                    .get();
        } while (response.getHits().getHits().length != 0);

I create a line chart to check the time consumed per ** scroll**. The x-axis of the chart is the number of scroll (0 is the first time running scroll). The y-axis is the time consumed.
2019-04-24_22-59

The elasticsearch version is 6.6.1 and use the default settings. I only modified jvm.options,

-Xms4g
-Xmx4g

The index size is 74.03GB and has 66,006,373 docs. (one shard and no replica)

system · May 22, 2019, 3:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch + Java - Can you further optimize this query? Elasticsearch	2	239	March 13, 2023
How To optimize Elastic Search Query and mapping to fetch the result quickly Kibana	1	146	August 18, 2023
How to optimize elasticsearch query dsl to improve the query performance？ Elasticsearch	9	449	February 14, 2023
Search as you type with Java API Elasticsearch	8	6066	July 6, 2017
Whole phrases searching in large texts in ElasticSearch take a long of time Elasticsearch	1	488	June 22, 2017

How to optimize the term query?

Related topics