It takes a long time to query the keyword field

Hi,

I have a question regarding query performance. What is the difference between querying the normal field and querying the keyword field? I did a test. The data types of the fields I am querying are as follows.

                     "primaryIdentificationNumber" : {
                        "type" : "keyword",
                        "fields" : {
                          "keyword" : {
                            "type" : "keyword",
                            "ignore_above" : 256
                          }
                        }
                      },
                      "primaryIdentificationType" : {
                        "type" : "keyword",
                        "fields" : {
                          "keyword" : {
                            "type" : "keyword",
                            "ignore_above" : 256
                          }
                        }
                      },
                      "emailAddress" : {
                        "type" : "keyword",
                        "fields" : {
                          "keyword" : {
                            "type" : "keyword",
                            "ignore_above" : 256
                          }
                        }
                      },
                      "emailType" : {
                        "type" : "keyword",
                        "fields" : {
                          "keyword" : {
                            "type" : "keyword",
                            "ignore_above" : 256
                          }
                        }
                      }

I ran my first query with normal fields like below. This query took 233ms.

GET /bsts-application-*/_search?pretty
{
"query":{
	"bool":{
		"must": [{
			"nested":{
				"path": "application.applicant",
				"inner_hits":{},
				"query":{
					"bool":{
						"must":[{
							"bool":{
								"must":[{
									"match": { 
										"application.applicant.primaryIdentificationNumber": "48749018911"
									}
								},	{
									"match": {
										"application.applicant.primaryIdentificationType": "NATIONAL_ID"
									}
								}]
							}
						},	{
							"bool":{
								"must_not":[],
								"must":[{
									"nested": {
										"path": "application.applicant.email",
										"query": {
											"bool":{
												"must":[{
													"match":{
														"application.applicant.email.emailAddress": "xxx@gmail.com"
													}
												}],
												"must_not": [{
													"match": {
														"application.applicant.email.emailType": "sdf"
													}
												}]
											}
										}
									}
								}]
							}
						}]
					}
				}
			}
		}]
	}
}
}

I ran my second query with keyword fields like below. This query took 423ms.

GET /bsts-application-*/_search?pretty
{
"query":{
	"bool":{
		"must": [{
			"nested":{
				"path": "application.applicant",
				"inner_hits":{},
				"query":{
					"bool":{
						"must":[{
							"bool":{
								"must":[{
									"match": { 
										"application.applicant.primaryIdentificationNumber.keyword": "48749018911"
									}
								},	{
									"match": {
										"application.applicant.primaryIdentificationType.keyword": "NATIONAL_ID"
									}
								}]
							}
						},	{
							"bool":{
								"must_not":[],
								"must":[{
									"nested": {
										"path": "application.applicant.email",
										"query": {
											"bool":{
												"must":[{
													"match":{
														"application.applicant.email.emailAddress.keyword": "xxx@gmail.com"
													}
												}],
												"must_not": [{
													"match": {
														"application.applicant.email.emailType.keyword": "sdf"
													}
												}]
											}
										}
									}
								}]
							}
						}]
					}
				}
			}
		}]
	}
}
}

Actually, what I want to do is to understand the logic of the event.
Why did the query take longer when I used the keyword field in the query? What is the working logic?

Regars

Does going to a subfield of a field affect performance while querying?

Hi @Ruwi ,

Using a subfield does not affect performance. However I see a couple of ways to improve things in your setup:

  • The main reason for a subfield is to support different types of matching. The most typical use case is when we have a top-level text field (that does analysis for full text matching), but we also want to support keyword matching, e.g. for aggregations, sorting or exact, prefix etc. matching. In your config primaryIdentificationNumber is already a keyword field so you can already perform those operations. Adding a primaryIdentificationNumber.keyword doesn't have any benefit, it just duplicates data, so it's safe to remove it.
  • In your sample query you're using a match query on a keyword field, which is sub-optimal. Depending on how you want to search data, you can either
    1. Use a match query on a text field, or
    2. Use a term query on a keyword field

Some more info on subfields: fields | Elasticsearch Guide [8.8] | Elastic

Please try modifying your mapping and your queries accordingly, and see if the performance improves.

Thank you for the information.

I created the query and data types I ran above manually in the Kibana ui to go to the keyword field and measure the performance of the query.

In the real scenario the problem is:

The emailAddress field is a text as seen below.

"email" : {
                    "type" : "nested",     
                    "properties" : {
                      "emailAddress" : {
                        "type" : "text",
                        "fields" : {
                          "keyword" : {
                            "type" : "keyword",
                            "ignore_above" : 256
                          }
                        }
                      },
                      "emailType" : {
                        "type" : "keyword",
                        "fields" : {
                          "keyword" : {
                            "type" : "keyword",
                            "ignore_above" : 256
                          }
                        }
                      }
                    }
                  }

Note: Except for the emailAdress I'm querying below, the other 3 fields are keyword.

When we run the query like below, it does not return any results. We were able to fix it by simply making a change to the query by querying the emailAddress field like this: application.applicant.email.emailAddress.keyword

GET /bsts-application-*/_search?pretty
{
"query":{
	"bool":{
		"must": [{
			"nested":{
				"path": "application.applicant",
				"inner_hits":{},
				"query":{
					"bool":{
						"must":[{
							"bool":{
								"must":[{
									"match": { 
										"application.applicant.primaryIdentificationNumber": "48749018911"
									}
								},	{
									"match": {
										"application.applicant.primaryIdentificationType": "NATIONAL_ID"
									}
								}]
							}
						},	{
							"bool":{
								"must_not":[],
								"must":[{
									"nested": {
										"path": "application.applicant.email",
										"query": {
											"bool":{
												"must":[{
													"match":{
														"application.applicant.email.emailAddress": "xxx@gmail.com"
													}
												}],
												"must_not": [{
													"match": {
														"application.applicant.email.emailType": "sdf"
													}
												}]
											}
										}
									}
								}]
							}
						}]
					}
				}
			}
		}]
	}
}
}

Before querying our Java application (the app querying the elastic), we made sure that each field goes to the keyword field as above. But after this improvement, we observed very high increases in executed query times.

But when I tested Kibana ui as static as I wrote above, I couldn't see much difference in query times.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.