Single machine, millions of data, search is very slow

hubiao · June 9, 2018, 5:39am

Environmental Science：

Window R2 64
JDK 1.8
ElasticSearch5.5.0
300 million data, 20G index.

elasticsearch.yml The configuration is as follows

Cluster.name: bropen
Node.name: node-1
Transport.tcp.port: 9300
Http.port: 9200
Network.host: 192.168.1.150
Network.publish_host: 192.168.1.150
Network.bind_host: 192.168.1.150

QUERY DSL is as follows:

GET/data/book_page/_search
{
  "Query": {
    "Match": {
      "text": "important speech"
    }
  }
}

Search is slow!

When searching, the information of the thread pool is as follows:

node-1 bulk                0 0 0
node-1 fetch_shard_started 0 0 0
node-1 fetch_shard_store   0 0 0
node-1 flush               0 0 0
node-1 force_merge         0 0 0
node-1 generic             0 0 0
node-1 get                 0 0 0
node-1 index               0 0 0
node-1 listener            0 0 0
node-1 management          1 0 0
node-1 refresh             0 0 0
node-1 search              4 1 0
node-1 snapshot            0 0 0
node-1 warmer              0 0 0

The following questions are as follows:
1: the use of participle search is very slow!
2: after the search is slow, refresh the page and search the word without segmentation. It should have been fast and the result was still slow.

And then refresh and search it more slowly, as if waiting for the end of the last task.

dadoonet · June 9, 2018, 6:03am

Can you share the elasticsearch response after you sent your query please?

hubiao · June 9, 2018, 7:29am

GET/data/book_page/_search
{
  "Query": {
    "Match": {
      "text": "important speech"
    }
  }
}

The search speed is slow, this is my query statement

hubiao · June 9, 2018, 7:45am

[2018-06-09T15:43:39,334][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1780] overhead, spent [385ms] collecting in the last [1.2s]
[2018-06-09T15:43:40,723][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1781] overhead, spent [418ms] collecting in the last [1.3s]
[2018-06-09T15:43:42,142][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1782] overhead, spent [427ms] collecting in the last [1.4s]
[2018-06-09T15:43:43,312][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1783] overhead, spent [325ms] collecting in the last [1.1s]
[2018-06-09T15:43:50,437][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1786] overhead, spent [260ms] collecting in the last [1s]
[2018-06-09T15:43:52,652][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1788] overhead, spent [322ms] collecting in the last [1.2s]
[2018-06-09T15:43:53,684][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1789] overhead, spent [304ms] collecting in the last [1s]
[2018-06-09T15:44:02,077][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1797] overhead, spent [300ms] collecting in the last [1.1s]
[2018-06-09T15:44:03,075][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1798] overhead, spent [283ms] collecting in the last [1s]
[2018-06-09T15:44:07,147][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1802] overhead, spent [277ms] collecting in the last [1s]
[2018-06-09T15:44:08,176][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1803] overhead, spent [269ms] collecting in the last [1s]
[2018-06-09T15:44:21,142][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1815] overhead, spent [302ms] collecting in the last [1.1s]
[2018-06-09T15:44:22,468][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1816] overhead, spent [400ms] collecting in the last [1.3s]

This is the log message from the console

hubiao · June 9, 2018, 7:50am

Can you understand what I mean

hubiao · June 9, 2018, 8:22am

The configuration information is as follows 1

GET /_nodes
{
   "_nodes": {
  "total": 1,
  "successful": 1,
  "failed": 0
   },
   "cluster_name": "bropen",
   "nodes": {
  "gyWNXzz8Sj2VyXgw6w_pXg": {
     "name": "node-1",
     "transport_address": "192.168.1.150:9300",
     "host": "192.168.1.150",
     "ip": "192.168.1.150",
     "version": "5.5.0",
     "build_hash": "260387d",
     "total_indexing_buffer": 857250201,
     "roles": [
        "master",
        "data",
        "ingest"
     ],
     "settings": {
        "cluster": {
           "name": "bropen"
        },
        "node": {
           "name": "node-1"
        },
        "path": {
           "logs": "D:\\project\\cms\\elasticsearch-5.5.0\\logs",
           "home": "D:\\project\\cms\\elasticsearch-5.5.0"
        },
        "client": {
           "type": "node"
        },
        "http": {
           "type": {
              "default": "netty4"
           },
           "port": "9200"
        },
        "transport": {
           "tcp": {
              "port": "9300"
           },
           "type": {
              "default": "netty4"
           }
        },
        "network": {
           "host": "192.168.1.150",
           "bind_host": "192.168.1.150",
           "publish_host": "192.168.1.150"
        }
     },
     "os": {
        "refresh_interval_in_millis": 1000,
        "name": "Windows Server 2008 R2",
        "arch": "amd64",
        "version": "6.1",
        "available_processors": 2,
        "allocated_processors": 2
     },
     "process": {
        "refresh_interval_in_millis": 1000,
        "id": 6720,
        "mlockall": false
     },
     "jvm": {
        "pid": 6720,
        "version": "1.8.0_111",
        "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version": "25.111-b14",
        "vm_vendor": "Oracle Corporation",
        "start_time_in_millis": 1528531652299,
        "mem": {
           "heap_init_in_bytes": 8589934592,
           "heap_max_in_bytes": 8572502016,
           "non_heap_init_in_bytes": 2555904,
           "non_heap_max_in_bytes": 0,
           "direct_max_in_bytes": 8572502016
        },
        "gc_collectors": [
           "ParNew",
           "ConcurrentMarkSweep"
        ],
        "memory_pools": [
           "Code Cache",
           "Metaspace",
           "Compressed Class Space",
           "Par Eden Space",
           "Par Survivor Space",
           "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers": "true",
        "input_arguments": [
           "-Xms8g",
           "-Xmx8g",
           "-XX:+UseConcMarkSweepGC",
           "-XX:CMSInitiatingOccupancyFraction=75",
           "-XX:+UseCMSInitiatingOccupancyOnly",
           "-XX:+DisableExplicitGC",
           "-XX:+AlwaysPreTouch",
           "-Xss1m",
           "-Djava.awt.headless=true",
           "-Dfile.encoding=UTF-8",
           "-Djna.nosys=true",
           "-Djdk.io.permissionsUseCanonicalPath=true",
           "-Dio.netty.noUnsafe=true",
           "-Dio.netty.noKeySetOptimization=true",
           "-Dio.netty.recycler.maxCapacityPerThread=0",
           "-Dlog4j.shutdownHookEnabled=false",
           "-Dlog4j2.disable.jmx=true",
           "-Dlog4j.skipJansi=true",
           "-XX:+HeapDumpOnOutOfMemoryError",
           "-Djava.security.policy=file:///D:/java/elasticsearch-5.5.0/plugins/hanlp/plugin-security.policy",
           "-Delasticsearch",
           "-Des.path.home=D:\\project\\cms\\elasticsearch-5.5.0"
        ]
     },
     "thread_pool": {
        "force_merge": {
           "type": "fixed",
           "min": 1,
           "max": 1,
           "queue_size": -1
        },
        "fetch_shard_started": {
           "type": "scaling",
           "min": 1,
           "max": 4,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "listener": {
           "type": "fixed",
           "min": 1,
           "max": 1,
           "queue_size": -1
        },
        "index": {
           "type": "fixed",
           "min": 2,
           "max": 2,
           "queue_size": 200
        },
        "refresh": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "generic": {
           "type": "scaling",
           "min": 4,
           "max": 128,
           "keep_alive": "30s",
           "queue_size": -1
        },
        "warmer": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "search": {
           "type": "fixed",
           "min": 4,
           "max": 4,
           "queue_size": 1000
        },
        "flush": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "fetch_shard_store": {
           "type": "scaling",
           "min": 1,
           "max": 4,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "management": {
           "type": "scaling",
           "min": 1,
           "max": 5,
           "keep_alive": "5m",
           "queue_size": -1
        },

hubiao · June 9, 2018, 8:23am

The configuration information is as follows 2

        "get": {
           "type": "fixed",
           "min": 2,
           "max": 2,
           "queue_size": 1000
        },
        "bulk": {
           "type": "fixed",
           "min": 2,
           "max": 2,
           "queue_size": 200
        },
        "snapshot": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        }
     },
     "transport": {
        "bound_address": [
           "192.168.1.150:9300"
        ],
        "publish_address": "192.168.1.150:9300",
        "profiles": {}
     },
     "http": {
        "bound_address": [
           "192.168.1.150:9200"
        ],
        "publish_address": "192.168.1.150:9200",
        "max_content_length_in_bytes": 104857600
     },
     "plugins": [
        {
           "name": "elasticsearch-hanlp",
           "version": "1.0.0",
           "description": "HanLP for ElasticSearch",
           "classname": "org.elasticsearch.plugin.analysis.AnalysisHanLPPlugin",
           "has_native_controller": false
        }
     ],
     "modules": [
        {
           "name": "aggs-matrix-stats",
           "version": "5.5.0",
           "description": "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
           "classname": "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
           "has_native_controller": false
        },
        {
           "name": "ingest-common",
           "version": "5.5.0",
           "description": "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
           "classname": "org.elasticsearch.ingest.common.IngestCommonPlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-expression",
           "version": "5.5.0",
           "description": "Lucene expressions integration for Elasticsearch",
           "classname": "org.elasticsearch.script.expression.ExpressionPlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-groovy",
           "version": "5.5.0",
           "description": "Groovy scripting integration for Elasticsearch",
           "classname": "org.elasticsearch.script.groovy.GroovyPlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-mustache",
           "version": "5.5.0",
           "description": "Mustache scripting integration for Elasticsearch",
           "classname": "org.elasticsearch.script.mustache.MustachePlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-painless",
           "version": "5.5.0",
           "description": "An easy, safe and fast scripting language for Elasticsearch",
           "classname": "org.elasticsearch.painless.PainlessPlugin",
           "has_native_controller": false
        },
        {
           "name": "parent-join",
           "version": "5.5.0",
           "description": "This module adds the support parent-child queries and aggregations",
           "classname": "org.elasticsearch.join.ParentJoinPlugin",
           "has_native_controller": false
        },
        {
           "name": "percolator",
           "version": "5.5.0",
           "description": "Percolator module adds capability to index queries and query these queries by specifying documents",
           "classname": "org.elasticsearch.percolator.PercolatorPlugin",
           "has_native_controller": false
        },
        {
           "name": "reindex",
           "version": "5.5.0",
           "description": "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
           "classname": "org.elasticsearch.index.reindex.ReindexPlugin",
           "has_native_controller": false
        },
        {
           "name": "transport-netty3",
           "version": "5.5.0",
           "description": "Netty 3 based transport implementation",
           "classname": "org.elasticsearch.transport.Netty3Plugin",
           "has_native_controller": false
        },
        {
           "name": "transport-netty4",
           "version": "5.5.0",
           "description": "Netty 4 based transport implementation",
           "classname": "org.elasticsearch.transport.Netty4Plugin",
           "has_native_controller": false
        }
     ],
     "ingest": {
        "processors": [
           {
              "type": "append"
           },
           {
              "type": "convert"
           },
           {
              "type": "date"
           },
           {
              "type": "date_index_name"
           },
           {
              "type": "dot_expander"
           },
           {
              "type": "fail"
           },
           {
              "type": "foreach"
           },
           {
              "type": "grok"
           },
           {
              "type": "gsub"
           },
           {
              "type": "join"
           },
           {
              "type": "json"
           },
           {
              "type": "kv"
           },
           {
              "type": "lowercase"
           },
           {
              "type": "remove"
           },
           {
              "type": "rename"
           },
           {
              "type": "script"
           },
           {
              "type": "set"
           },
           {
              "type": "sort"
           },
           {
              "type": "split"
           },
           {
              "type": "trim"
           },
           {
              "type": "uppercase"
           }
        ]
     }
  }

}
}

hubiao · June 9, 2018, 8:25am

The query is quick at the start of the start
Different conditions, more than a few times, but slower, and memory is also rising.

hubiao · June 9, 2018, 8:26am

Can you see the information I offer?
Wait for your answer online, please.

dadoonet · June 9, 2018, 9:55am

Can you share the json response you are getting when you are running the query?

hubiao · June 11, 2018, 3:12am

{
   "took": 137,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": null,
      "hits": [
         {
            "_index": "book_page",
            "_type": "book_page",
            "_id": "a08e2041-95b8-49e6-a9d7-2c8ed1800ec8",
            "_score": null,
            "_source": {
               "bookDate_date_sore": "2016-03-01",
               "pageNo_int_sore": "8",
               "columns": [
                  {
                     "name": "/图书栏目",
                     "siteId": 6,
                     "id": 87
                  }
               ],
               "markName_sore": "目录",
               "bookId_no_analyzer_sore": "4ce89d05-6f7a-4dd8-8320-8d9cc09ca734",
               "text_nlp_sore": "示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文",
               "bookTitle_sore": "“示例中文示例中文示例中文"
            },
            "sort": [
               "0.6931471805599453",
               1456790400000
            ]
         },
         {
            "_index": "book_page",
            "_type": "book_page",
            "_id": "15e082c1-0e8c-4949-bfc0-4b405ecb11e6",
            "_score": null,
            "_source": {
               "bookDate_date_sore": "2016-03-01",
               "pageNo_int_sore": "46",
               "columns": [
                  {
                     "name": "/图书栏目",
                     "siteId": 6,
                     "id": 87
                  }
               ],
               "markName_sore": "示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文",
               "bookId_no_analyzer_sore": "4ce89d05-6f7a-4dd8-8320-8d9cc09ca734",
               "text_nlp_sore": "示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文",
               "bookTitle_sore": "“示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文"
            },
            "sort": [
               "0.0",
               1456790400000
            ]
         }
      ]
   }
}

dadoonet · June 11, 2018, 5:24am

Was that slow on your end?
If so, check your network because this response has been generated 137ms after the request has been received by elasticsearch.
Which is not slow IMO.

hubiao · June 11, 2018, 5:31am

Really slow, oh, this result is my small data test.

dadoonet · June 11, 2018, 6:16am

Please share a real result which is slow.

system · July 9, 2018, 6:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.