Why does this query take so long? How can I make it faster?


(Michael Penkov) #1

I have a query that looks like this:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "Profound Networks",
              "boost": 2
            }
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "Profound Networks",
              "boost": 20
            }
          }
        },
        {
          "match": {
            "whois.registrant.country": {
              "query": "United States",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "whois.admin.country": {
              "query": "United States",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "tld": {
              "query": ".us",
              "boost": 10
            }
          }
        }
      ]
    }
  }
}

It takes over 30s. How can I speed this up?

While trying to answer the question, I've found that the part that slows it down the most is the country clauses. Removing them gives the query:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "Profound Networks",
              "boost": 2
            }
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "Profound Networks",
              "boost": 20
            }
          }
        },
        {
          "match": {
            "tld": {
              "query": ".us",
              "boost": 10
            }
          }
        }
      ]
    }
  }
}

and this query takes less than a second. Why do the country match clauses slow the query down so much?

Is there any way I can speed up the original query without removing parts of it?


(Xavier Facq) #2

How many datas have you ? How many servers, nodes ?

Maybe you can change the mapping of your field "whois.admin.country", to not_analyzed ? and use a term_query.


(Michael Penkov) #3

I'm searching 240 million documents. They're all on a one-node Elastic Cloud cluster.

Thank you for your suggestion about not_analyzed. I will try it.

One of the problems is that the country is part of the query, and there are many documents that contain that particular country. As a consequence, including the country dramatically increases the number of results. Ideally, I'd like to do this:

  1. Search for documents that match the company name
  2. Ranked the result by matching the country

Do you know of a way to achieve this?


(Jos) #4

Doing this for all 3 language related fields should give you what you need. Or if you don't want the extra results from the country query matches, you should look into compound queries like the boosting query.

Also, you might need to be aware of the fact that in your last query ".us" the dot (.) will be removed, so elasticsearch is actually querying for "us", not ".us".


(Michael Penkov) #5

Thank you. I will try your suggestions.


(Nik Everett) #6

Are you saying you don't want all the results with United States or us in them? If that is the case then you should structure your query a bit differently:


{
  "query": {
    "bool": {
      "must": [
        "bool": {
          "should": [
            {
              "match": {
                "title": {
                  "query": "Profound Networks",
                  "boost": 2
                }
              }
            },
            {
              "match_phrase": {
               "title": {
                  "query": "Profound Networks",
                  "boost": 20
                }
              }
            }
          ]
        }
      ],
      "should": [
        {
          "match": {
            "whois.registrant.country": {
              "query": "United States",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "whois.admin.country": {
              "query": "United States",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "tld": {
              "query": ".us",
              "boost": 10
            }
          }
        }
      ]
    }
  }
}

I think that'd work better if that is your goal.


(system) #7

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.