Query takes to much time in elastic compare to sphinx

Hi Support,

We moved Sphinx to ElasticSearch.
Elastic and Sphinx server configuration is the same.

OS: Ubuntu 18.04
RAM: 60GB(50% occupied Elastic)
Disc : 2TB
Processor: 4 Core

Elastic Configuration: ELK - 7.7.X

We have 450 million records on production and index created with below mapping and settings.

Index shards: 25
Index size : 890GB

Mapping:
{
  "mappings": {
    "_doc": {
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "@version": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "group_id": {
          "type": "long"
        },
        "member_id": {
          "type": "integer"
        },
        "foldername": {
          "type": "keyword",
          "ignore_above": 256
        },
        "f_length": {
          "type": "long"
        },
        "fol_id": {
          "type": "long"
        },
        "y_m": {
          "type": "integer"
        },
        "com_id": {
          "type": "integer"
        },
        "doc_type": {
          "properties": {
            "d": {
              "type": "short"
            }
          }
        },
        "doc_order": {
          "type": "long"
        },
        "f_data": {
          "type": "text"
        },
        "id": {
          "type": "long"
        },
        "document_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "analyzer": "splchar_analyzer"
        },
        "flg": {
          "type": "short"
        },
        "isdisplay": {
          "type": "short"
        },
        "deleted": {
          "type": "short"
        },
        "doc_section": {
          "properties": {
            "s": {
              "type": "short"
            }
          }
        },
        "type": {
          "type": "integer"
        }
      }
    }
  }
}

Setting:
{
  "index.blocks.read_only_allow_delete": "false",
  "index.priority": "1",
  "index.query.default_field": [
    "*"
  ],
  "index.write.wait_for_active_shards": "1",
  "index.highlight.max_analyzed_offset": "60000000",
  "index.refresh_interval": "300s",
  "index.analysis.analyzer.splchar_analyzer.filter": [
    "lowercase"
  ],
  "index.analysis.analyzer.splchar_analyzer.char_filter": [
    "spl_char_filter"
  ],
  "index.analysis.analyzer.splchar_analyzer.tokenizer": "standard",
  "index.analysis.char_filter.spl_char_filter.pattern": "\\.",
  "index.analysis.char_filter.spl_char_filter.type": "pattern_replace",
  "index.analysis.char_filter.spl_char_filter.replacement": " ",
  "index.number_of_replicas": "0"
}

When we search first-time below normal query it taking t much time[5-6sec]. after run 2 -3 times, get a response in 1sec.

GET _sql?format=txt
{
  "query":"""SELECT id FROM indexname WHERE QUERY('(f_data:("testing") OR document_name:("testing"))','default_operator=AND')  AND member_id = 5002 AND deleted in(0) AND type NOT IN (0,10,13,15,16) and fol_id > 0 and isdisplay = 0"""
}

While the same query checked with Sphinx it gives a response in 0.5 sec.

I have implemented this on the Production server and I am facing this slowness issue.

Can anyone help me with this? is there any missing in the configuration? or do we need to increase the heap size?

Thanks

Try keyword fields in place of numbers if you're doing exact-match lookups only.
Numeric fields are optimised for range queries.

1 Like

Hi Mark_Harwood,

Thank you for your reply.

Is there any issue with this configuration? Please let me know if required to change in configuration.

Can we make optimize the query?

To change mapping on a production server is quite a long process.

FYI - We have two different elastic nodes without Cluster. we indexed separately without replica.

Node 1: Shards 25, Replica 0
Node 2: Shard 25, Replica 0

Thanks

As with any production change, it would make sense to benchmark in another environment to prove the cost/benefits first.

Thank you.

Yes, we will do changes in another environment.

Can you please suggest the best configuration and mapping to overcome this issue?

Do we need to use cluster instead of separate node indexing?

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.