Single machine, millions of data, search is very slow


(hubiao) #1

Environmental Science:

Window R2 64
JDK 1.8
ElasticSearch5.5.0
300 million data, 20G index.

elasticsearch.yml The configuration is as follows

Cluster.name: bropen
Node.name: node-1
Transport.tcp.port: 9300
Http.port: 9200
Network.host: 192.168.1.150
Network.publish_host: 192.168.1.150
Network.bind_host: 192.168.1.150

QUERY DSL is as follows:

GET/data/book_page/_search
{
  "Query": {
    "Match": {
      "text": "important speech"
    }
  }
}

Search is slow!

When searching, the information of the thread pool is as follows:

node-1 bulk                0 0 0
node-1 fetch_shard_started 0 0 0
node-1 fetch_shard_store   0 0 0
node-1 flush               0 0 0
node-1 force_merge         0 0 0
node-1 generic             0 0 0
node-1 get                 0 0 0
node-1 index               0 0 0
node-1 listener            0 0 0
node-1 management          1 0 0
node-1 refresh             0 0 0
node-1 search              4 1 0
node-1 snapshot            0 0 0
node-1 warmer              0 0 0

The following questions are as follows:
1: the use of participle search is very slow!
2: after the search is slow, refresh the page and search the word without segmentation. It should have been fast and the result was still slow.

And then refresh and search it more slowly, as if waiting for the end of the last task.


(David Pilato) #2

Can you share the elasticsearch response after you sent your query please?


(hubiao) #3
GET/data/book_page/_search
{
  "Query": {
    "Match": {
      "text": "important speech"
    }
  }
}

The search speed is slow, this is my query statement


(hubiao) #5
[2018-06-09T15:43:39,334][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1780] overhead, spent [385ms] collecting in the last [1.2s]
[2018-06-09T15:43:40,723][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1781] overhead, spent [418ms] collecting in the last [1.3s]
[2018-06-09T15:43:42,142][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1782] overhead, spent [427ms] collecting in the last [1.4s]
[2018-06-09T15:43:43,312][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1783] overhead, spent [325ms] collecting in the last [1.1s]
[2018-06-09T15:43:50,437][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1786] overhead, spent [260ms] collecting in the last [1s]
[2018-06-09T15:43:52,652][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1788] overhead, spent [322ms] collecting in the last [1.2s]
[2018-06-09T15:43:53,684][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1789] overhead, spent [304ms] collecting in the last [1s]
[2018-06-09T15:44:02,077][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1797] overhead, spent [300ms] collecting in the last [1.1s]
[2018-06-09T15:44:03,075][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1798] overhead, spent [283ms] collecting in the last [1s]
[2018-06-09T15:44:07,147][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1802] overhead, spent [277ms] collecting in the last [1s]
[2018-06-09T15:44:08,176][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1803] overhead, spent [269ms] collecting in the last [1s]
[2018-06-09T15:44:21,142][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1815] overhead, spent [302ms] collecting in the last [1.1s]
[2018-06-09T15:44:22,468][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][1816] overhead, spent [400ms] collecting in the last [1.3s]

This is the log message from the console


(hubiao) #6

Can you understand what I mean


(hubiao) #7

The configuration information is as follows 1

GET /_nodes
{
   "_nodes": {
  "total": 1,
  "successful": 1,
  "failed": 0
   },
   "cluster_name": "bropen",
   "nodes": {
  "gyWNXzz8Sj2VyXgw6w_pXg": {
     "name": "node-1",
     "transport_address": "192.168.1.150:9300",
     "host": "192.168.1.150",
     "ip": "192.168.1.150",
     "version": "5.5.0",
     "build_hash": "260387d",
     "total_indexing_buffer": 857250201,
     "roles": [
        "master",
        "data",
        "ingest"
     ],
     "settings": {
        "cluster": {
           "name": "bropen"
        },
        "node": {
           "name": "node-1"
        },
        "path": {
           "logs": "D:\\project\\cms\\elasticsearch-5.5.0\\logs",
           "home": "D:\\project\\cms\\elasticsearch-5.5.0"
        },
        "client": {
           "type": "node"
        },
        "http": {
           "type": {
              "default": "netty4"
           },
           "port": "9200"
        },
        "transport": {
           "tcp": {
              "port": "9300"
           },
           "type": {
              "default": "netty4"
           }
        },
        "network": {
           "host": "192.168.1.150",
           "bind_host": "192.168.1.150",
           "publish_host": "192.168.1.150"
        }
     },
     "os": {
        "refresh_interval_in_millis": 1000,
        "name": "Windows Server 2008 R2",
        "arch": "amd64",
        "version": "6.1",
        "available_processors": 2,
        "allocated_processors": 2
     },
     "process": {
        "refresh_interval_in_millis": 1000,
        "id": 6720,
        "mlockall": false
     },
     "jvm": {
        "pid": 6720,
        "version": "1.8.0_111",
        "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
        "vm_version": "25.111-b14",
        "vm_vendor": "Oracle Corporation",
        "start_time_in_millis": 1528531652299,
        "mem": {
           "heap_init_in_bytes": 8589934592,
           "heap_max_in_bytes": 8572502016,
           "non_heap_init_in_bytes": 2555904,
           "non_heap_max_in_bytes": 0,
           "direct_max_in_bytes": 8572502016
        },
        "gc_collectors": [
           "ParNew",
           "ConcurrentMarkSweep"
        ],
        "memory_pools": [
           "Code Cache",
           "Metaspace",
           "Compressed Class Space",
           "Par Eden Space",
           "Par Survivor Space",
           "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers": "true",
        "input_arguments": [
           "-Xms8g",
           "-Xmx8g",
           "-XX:+UseConcMarkSweepGC",
           "-XX:CMSInitiatingOccupancyFraction=75",
           "-XX:+UseCMSInitiatingOccupancyOnly",
           "-XX:+DisableExplicitGC",
           "-XX:+AlwaysPreTouch",
           "-Xss1m",
           "-Djava.awt.headless=true",
           "-Dfile.encoding=UTF-8",
           "-Djna.nosys=true",
           "-Djdk.io.permissionsUseCanonicalPath=true",
           "-Dio.netty.noUnsafe=true",
           "-Dio.netty.noKeySetOptimization=true",
           "-Dio.netty.recycler.maxCapacityPerThread=0",
           "-Dlog4j.shutdownHookEnabled=false",
           "-Dlog4j2.disable.jmx=true",
           "-Dlog4j.skipJansi=true",
           "-XX:+HeapDumpOnOutOfMemoryError",
           "-Djava.security.policy=file:///D:/java/elasticsearch-5.5.0/plugins/hanlp/plugin-security.policy",
           "-Delasticsearch",
           "-Des.path.home=D:\\project\\cms\\elasticsearch-5.5.0"
        ]
     },
     "thread_pool": {
        "force_merge": {
           "type": "fixed",
           "min": 1,
           "max": 1,
           "queue_size": -1
        },
        "fetch_shard_started": {
           "type": "scaling",
           "min": 1,
           "max": 4,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "listener": {
           "type": "fixed",
           "min": 1,
           "max": 1,
           "queue_size": -1
        },
        "index": {
           "type": "fixed",
           "min": 2,
           "max": 2,
           "queue_size": 200
        },
        "refresh": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "generic": {
           "type": "scaling",
           "min": 4,
           "max": 128,
           "keep_alive": "30s",
           "queue_size": -1
        },
        "warmer": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "search": {
           "type": "fixed",
           "min": 4,
           "max": 4,
           "queue_size": 1000
        },
        "flush": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "fetch_shard_store": {
           "type": "scaling",
           "min": 1,
           "max": 4,
           "keep_alive": "5m",
           "queue_size": -1
        },
        "management": {
           "type": "scaling",
           "min": 1,
           "max": 5,
           "keep_alive": "5m",
           "queue_size": -1
        },

(hubiao) #8

The configuration information is as follows 2

        "get": {
           "type": "fixed",
           "min": 2,
           "max": 2,
           "queue_size": 1000
        },
        "bulk": {
           "type": "fixed",
           "min": 2,
           "max": 2,
           "queue_size": 200
        },
        "snapshot": {
           "type": "scaling",
           "min": 1,
           "max": 1,
           "keep_alive": "5m",
           "queue_size": -1
        }
     },
     "transport": {
        "bound_address": [
           "192.168.1.150:9300"
        ],
        "publish_address": "192.168.1.150:9300",
        "profiles": {}
     },
     "http": {
        "bound_address": [
           "192.168.1.150:9200"
        ],
        "publish_address": "192.168.1.150:9200",
        "max_content_length_in_bytes": 104857600
     },
     "plugins": [
        {
           "name": "elasticsearch-hanlp",
           "version": "1.0.0",
           "description": "HanLP for ElasticSearch",
           "classname": "org.elasticsearch.plugin.analysis.AnalysisHanLPPlugin",
           "has_native_controller": false
        }
     ],
     "modules": [
        {
           "name": "aggs-matrix-stats",
           "version": "5.5.0",
           "description": "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
           "classname": "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
           "has_native_controller": false
        },
        {
           "name": "ingest-common",
           "version": "5.5.0",
           "description": "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
           "classname": "org.elasticsearch.ingest.common.IngestCommonPlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-expression",
           "version": "5.5.0",
           "description": "Lucene expressions integration for Elasticsearch",
           "classname": "org.elasticsearch.script.expression.ExpressionPlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-groovy",
           "version": "5.5.0",
           "description": "Groovy scripting integration for Elasticsearch",
           "classname": "org.elasticsearch.script.groovy.GroovyPlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-mustache",
           "version": "5.5.0",
           "description": "Mustache scripting integration for Elasticsearch",
           "classname": "org.elasticsearch.script.mustache.MustachePlugin",
           "has_native_controller": false
        },
        {
           "name": "lang-painless",
           "version": "5.5.0",
           "description": "An easy, safe and fast scripting language for Elasticsearch",
           "classname": "org.elasticsearch.painless.PainlessPlugin",
           "has_native_controller": false
        },
        {
           "name": "parent-join",
           "version": "5.5.0",
           "description": "This module adds the support parent-child queries and aggregations",
           "classname": "org.elasticsearch.join.ParentJoinPlugin",
           "has_native_controller": false
        },
        {
           "name": "percolator",
           "version": "5.5.0",
           "description": "Percolator module adds capability to index queries and query these queries by specifying documents",
           "classname": "org.elasticsearch.percolator.PercolatorPlugin",
           "has_native_controller": false
        },
        {
           "name": "reindex",
           "version": "5.5.0",
           "description": "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
           "classname": "org.elasticsearch.index.reindex.ReindexPlugin",
           "has_native_controller": false
        },
        {
           "name": "transport-netty3",
           "version": "5.5.0",
           "description": "Netty 3 based transport implementation",
           "classname": "org.elasticsearch.transport.Netty3Plugin",
           "has_native_controller": false
        },
        {
           "name": "transport-netty4",
           "version": "5.5.0",
           "description": "Netty 4 based transport implementation",
           "classname": "org.elasticsearch.transport.Netty4Plugin",
           "has_native_controller": false
        }
     ],
     "ingest": {
        "processors": [
           {
              "type": "append"
           },
           {
              "type": "convert"
           },
           {
              "type": "date"
           },
           {
              "type": "date_index_name"
           },
           {
              "type": "dot_expander"
           },
           {
              "type": "fail"
           },
           {
              "type": "foreach"
           },
           {
              "type": "grok"
           },
           {
              "type": "gsub"
           },
           {
              "type": "join"
           },
           {
              "type": "json"
           },
           {
              "type": "kv"
           },
           {
              "type": "lowercase"
           },
           {
              "type": "remove"
           },
           {
              "type": "rename"
           },
           {
              "type": "script"
           },
           {
              "type": "set"
           },
           {
              "type": "sort"
           },
           {
              "type": "split"
           },
           {
              "type": "trim"
           },
           {
              "type": "uppercase"
           }
        ]
     }
  }

}
}


(hubiao) #9

The query is quick at the start of the start
Different conditions, more than a few times, but slower, and memory is also rising.


(hubiao) #10

Can you see the information I offer?
Wait for your answer online, please.


(David Pilato) #11

Can you share the json response you are getting when you are running the query?


(hubiao) #12
{
   "took": 137,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": null,
      "hits": [
         {
            "_index": "book_page",
            "_type": "book_page",
            "_id": "a08e2041-95b8-49e6-a9d7-2c8ed1800ec8",
            "_score": null,
            "_source": {
               "bookDate_date_sore": "2016-03-01",
               "pageNo_int_sore": "8",
               "columns": [
                  {
                     "name": "/图书栏目",
                     "siteId": 6,
                     "id": 87
                  }
               ],
               "markName_sore": "目录",
               "bookId_no_analyzer_sore": "4ce89d05-6f7a-4dd8-8320-8d9cc09ca734",
               "text_nlp_sore": "示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文",
               "bookTitle_sore": "“示例中文示例中文示例中文"
            },
            "sort": [
               "0.6931471805599453",
               1456790400000
            ]
         },
         {
            "_index": "book_page",
            "_type": "book_page",
            "_id": "15e082c1-0e8c-4949-bfc0-4b405ecb11e6",
            "_score": null,
            "_source": {
               "bookDate_date_sore": "2016-03-01",
               "pageNo_int_sore": "46",
               "columns": [
                  {
                     "name": "/图书栏目",
                     "siteId": 6,
                     "id": 87
                  }
               ],
               "markName_sore": "示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文",
               "bookId_no_analyzer_sore": "4ce89d05-6f7a-4dd8-8320-8d9cc09ca734",
               "text_nlp_sore": "示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文",
               "bookTitle_sore": "“示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文示例中文"
            },
            "sort": [
               "0.0",
               1456790400000
            ]
         }
      ]
   }
}

(David Pilato) #13

Was that slow on your end?
If so, check your network because this response has been generated 137ms after the request has been received by elasticsearch.
Which is not slow IMO.


(hubiao) #14

Really slow, oh, this result is my small data test.


(David Pilato) #15

Please share a real result which is slow.


(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.