Elastic Search Common Documents Query

I have the following documents in elastic search:

[{
	"_version": 1,
	"uid": "blt275162870bd76359",
	"title": "Entry1",
	"publish_details": [{
			"environment": "blt603fe91adbdcff66",
			"locale": "en-us",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		},
		{
			"environment": "blt603fe91adbdcff66",
			"locale": "hi-in",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		}
	]
},
{
	"_version": 1,
	"title": "Entry12",
	"uid": "blt275162870bd76359",
	"publish_details": [{
		"environment": "blt603fe91adbdcff66",
		"locale": "mr-in",
		"time": "2020-06-27T11:58:17.699Z",
		"user": "bltaadab2f531206e9d"
	}, {
		"environment": "blt603fe91adbdcff67",
		"locale": "mr-in",
		"time": "2020-06-27T11:58:17.699Z",
		"user": "bltaadab2f531206e9d"
	}]
},
{
	"_version": 1,
	"uid": "blt275162870bd763523",
	"title": "Entry100",
	"publish_details": [{
			"environment": "blt603fe91adbdcff66",
			"locale": "en-us",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		},
		{
			"environment": "blt603fe91adbdcff66",
			"locale": "hi-in",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		}, {
			"environment": "blt603fe91adbdcff66",
			"locale": "mr-in",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		}
	]
},
{
	"_version": 1,
	"title": "Entry18",
	"uid": "blt275162870bd76355",
	"publish_details": [{
		"environment": "blt603fe91adbdcff66",
		"locale": "en-us",
		"time": "2020-06-27T11:58:17.699Z",
		"user": "bltaadab2f531206e9d"
	}]
},
{
	"_version": 1,
	"title": "Entry16",
	"uid": "blt275162870bd76354",
	"publish_details": []
},
{
	"_version": 1,
	"title": "Entry20",
	"uid": "blt275162870bd76353",
	"publish_details": [{
		"environment": "blt603fe91adbdcff66",
		"locale": "en-us",
		"time": "2020-06-27T11:58:17.699Z",
		"user": "bltaadab2f531206e9d"
	}]
}]

and I want the following response:

[{
	"uid": "blt275162870bd76359"
},{


"uid": "blt275162870bd763523"

}]

Kindly help me here to create a query for this? publish_details has nested datatype. I want that document whose publish_details.locale is hi-in.mr-in and en-us and publish_details.environment is blt603fe91adbdcff66 and with uid check. the reason here is uid blt275162870bd76359 is repeated twice.

Also, I have millions of documents in the index. so the result may be consists of thousands or millions of uid but I want whole uid result in a single query without batching. kindly help me here?

Two answers.

You can filter the response using filter_path option. So you can get the uid only.

You can use:

  • the size and from parameters to display by default up to 10000 records to your users. If you want to change this limit, you can change index.max_result_window setting but be aware of the consequences (ie memory).
  • the search after feature to do deep pagination.
  • the Scroll API if you want to extract a resultset to be consumed by another tool later.

@dadoonet thanks for the reply But,Can you please help me in query here. Means How I can get a valid result?

This is exactly what I answered to. Check the links I shared.

Hi @dadoonet the links that you shared with me related to pagination part but initially I need a query to get the result.So can you please help me here to create a query? Means how I can get atleast above two uids using query.

Use a bool query with a should array which contains 2 term queries.

Thanks for the reply @dadoonet but if you see my documents the first two documents have the same uid. and if you check the publish_details section then the first documents (which have the same uid have publish_details.environment is blt603fe91adbdcff66 along with publish_details.environment is en-us, hi-in, mr-in for that environment) and the third document contain all in publish_details. So I want only that document.s if I use the following query:

{
	"query": {
		"bool": {
			"should": [{
				"nested": {
					"path": "publish_details",
					"query": {
						"term": {
							"publish_details.environment": "blt69759a50c5ca0f29"

						}
					}
				}
			}, {
				"nested": {
					"path": "publish_details",
					"query": {
						"terms": {
							"publish_details.environment": ["hi-in", "mr-in", "en-us"]

						}
					}
				}
			}]
		}
	}
}

then it is not giving me the valid output. Sorry, but can you help me here to create exact query by considering that common uid, environment and locale.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Hi @dadoonet Here are my index and documents details:

   PUT mytestindex
{
	"mappings": {
		"documents": {
			"properties": {
				"type_uid": {
					"type": "keyword"
				},
				"key": {
					"type": "keyword"
				},
				"publish_details": {
					"type": "nested",
					"properties": {
						"environment": {
							"type": "keyword"
						},
						"locale": {
							"type": "keyword"
						},
						"time": {
							"type": "date"
						},
						"user": {
							"type": "keyword"
						}
					}
				},
				"title": {
					"type": "text",
					"fields": {
						"raw": {
							"type": "keyword"
						}
					}
				},
				"uid": {
					"type": "keyword"
				},
					"version": {
					"type": "integer"
				}
			}
		}
	}
}

PUT mytestindex/documents/1
{
    "version": 1,
		"uid": "blt275162870bd76359",
		"title": "Entry1",
		"publish_details": [{
				"environment": "blt603fe91adbdcff66",
				"locale": "en-us",
				"time": "2020-06-27T11:58:17.699Z",
				"user": "bltaadab2f531206e9d"
			},
			{
				"environment": "blt603fe91adbdcff66",
				"locale": "hi-in",
				"time": "2020-06-27T11:58:17.699Z",
				"user": "bltaadab2f531206e9d"
			}
		],
		"type_uid":"java",
		"key":"blt9a73af140d639159"
}
PUT mytestindex/documents/2
{
    "version": 1,
		"title": "Entry12",
		"uid": "blt275162870bd76359",
		"publish_details": [{
			"environment": "blt603fe91adbdcff66",
			"locale": "mr-in",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		}, {
			"environment": "blt603fe91adbdcff67",
			"locale": "mr-in",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		}],
		"type_uid":"java",
		"key":"blt9a73af140d639159"
}

PUT mytestindex/documents/3
{
    "version": 1,
		"uid": "blt275162870bd763523",
		"title": "Entry100",
		"publish_details": [{
				"environment": "blt603fe91adbdcff66",
				"locale": "en-us",
				"time": "2020-06-27T11:58:17.699Z",
				"user": "bltaadab2f531206e9d"
			},
			{
				"environment": "blt603fe91adbdcff66",
				"locale": "hi-in",
				"time": "2020-06-27T11:58:17.699Z",
				"user": "bltaadab2f531206e9d"
			}, {
				"environment": "blt603fe91adbdcff66",
				"locale": "mr-in",
				"time": "2020-06-27T11:58:17.699Z",
				"user": "bltaadab2f531206e9d"
			}
		],
		"type_uid":"java",
		"key":"blt9a73af140d639159"
}

PUT mytestindex/documents/4
{
    "version": 1,
	"title": "Entry18",
		"uid": "blt275162870bd76355",
		"publish_details": [{
			"environment": "blt603fe91adbdcff66",
			"locale": "en-us",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		}],
		"type_uid":"java",
		"key":"blt9a73af140d639159"
}

PUT mytestindex/documents/5
{
    "version": 1,
	"title": "Entry16",
		"uid": "blt275162870bd76354",
		"publish_details": [],
		"type_uid":"java",
		"key":"blt9a73af140d639159"
}

PUT mytestindex/documents/6
{
    "version": 1,
	"title": "Entry20",
		"uid": "blt275162870bd76353",
		"publish_details": [{
			"environment": "blt603fe91adbdcff66",
			"locale": "en-us",
			"time": "2020-06-27T11:58:17.699Z",
			"user": "bltaadab2f531206e9d"
		}],
		"type_uid":"java",
		"key":"blt9a73af140d639159"
}

So I want the following result (whose "type_uid":"java" and "key":"blt9a73af140d639159" and publish_details.environment is "blt603fe91adbdcff66" and publish_details.locale in mr-in, en-us, hi-in )

[{
    "uid": "blt275162870bd76359"
},{
"uid": "blt275162870bd763523"
}] 

Kindly help in the query here?

You can may be run:

GET /mytestindex/_search?filter_path=hits.hits._source.uid
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "type_uid": "java"
          }
        },        {
          "term": {
            "key": "blt9a73af140d639159"
          }
        },
        {
          "nested": {
            "path": "publish_details",
            "query": {
              "term": {
                "publish_details.environment": "blt603fe91adbdcff66"
              }
            }
          }
        },        {
          "nested": {
            "path": "publish_details",
            "query": {
              "terms": {
                "publish_details.locale": [
                  "mr-in",
                  "en-us",
                  "hi-in"
                ]
              }
            }
          }
        }
      ]
    }
  }  
}

Or if you want to group the results:

GET /mytestindex/_search?filter_path=aggregations.uid.buckets.key
{
  "size": 0,
  "aggs": {
    "uid": {
      "terms": {
        "field": "uid"
      }
    }
  }, 
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "type_uid": "java"
          }
        },        {
          "term": {
            "key": "blt9a73af140d639159"
          }
        },
        {
          "nested": {
            "path": "publish_details",
            "query": {
              "term": {
                "publish_details.environment": "blt603fe91adbdcff66"
              }
            }
          }
        },        {
          "nested": {
            "path": "publish_details",
            "query": {
              "terms": {
                "publish_details.locale": [
                  "mr-in",
                  "en-us",
                  "hi-in"
                ]
              }
            }
          }
        }
      ]
    }
  }  
}

Which gives:

{
  "aggregations" : {
    "uid" : {
      "buckets" : [
        {
          "key" : "blt275162870bd76359"
        },
        {
          "key" : "blt275162870bd763523"
        },
        {
          "key" : "blt275162870bd76353"
        },
        {
          "key" : "blt275162870bd76355"
        }
      ]
    }
  }
}

HTH

Hi @dadoonet thanks for the reply, but the above query not giving me the expected result I need only:

[{
    "uid": "blt275162870bd76359"
},{
"uid": "blt275162870bd763523"
}]  

this 2 uids are having "publish_details.environment": "blt603fe91adbdcff66" and "publish_details.locale": ["mr-in","en-us","hi-in"]

is any way to get this result only? Want query for that? Please help me here.

hi, @dadoonet can you please help me here to create a query and expected result.

Please be patient in waiting for responses to your question and refrain from pinging multiple times asking for a response or opening multiple topics for the same question. This is a community forum, it may take time for someone to reply to your question. For more information please refer to the Community Code of Conduct specifically the section "Be patient". Also, please refrain from pinging folks directly, this is a forum and anyone that participates might be able to assist you.

If you are in need of a service with an SLA that covers response times for questions then you may want to consider talking to us about a subscription.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

If I read correctly the specifications you asked:

So I want the following result (whose "type_uid":"java" and "key":"blt9a73af140d639159" and publish_details.environment is "blt603fe91adbdcff66" and publish_details.locale in mr-in, en-us, hi-in )

This document is matching:

  {
    "_index" : "mytestindex",
    "_type" : "_doc",
    "_id" : "4",
    "_score" : 1.3107349,
    "_source" : {
      "version" : 1,
      "title" : "Entry18",
      "uid" : "blt275162870bd76355",
      "publish_details" : [
        {
          "environment" : "blt603fe91adbdcff66",
          "locale" : "en-us",
          "time" : "2020-06-27T11:58:17.699Z",
          "user" : "bltaadab2f531206e9d"
        }
      ],
      "type_uid" : "java",
      "key" : "blt9a73af140d639159"
    }
  }

Why should not it be part of the result?

I'm expecting to see blt275162870bd76355 in the output.

I want that uids whose publish_details matching following case :

"publish_details":[{
 "environment": "blt603fe91adbdcff66",
 "locale": "en-us"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "hi-in"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "mr-in"
 }]

Please ignore publish_details other parameters (time, user)

And as per the condition, only blt275162870bd76359 and blt275162870bd763523 satisfy the case. If you see blt275162870bd76359 uid is repeated two times here but if we merge it's publish_details then we will get the following result:

"publish_details":[{
 "environment": "blt603fe91adbdcff66",
 "locale": "en-us"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "hi-in"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "mr-in"
 },{
 "environment": "blt603fe91adbdcff67",
 "locale": "mr-in"
 }]

It means blt275162870bd76359 match the condition also uid blt275162870bd763523 having publish_details as :

"publish_details":[{
 "environment": "blt603fe91adbdcff66",
 "locale": "en-us"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "hi-in"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "mr-in"
 }]

means only blt275162870bd76359 and blt275162870bd763523 match the condition. uid blt275162870bd76355 is only having "environment": "blt603fe91adbdcff66" and "locale" : "en-us" but it not match other 2 conditions with same environment and two other locales.

SO I want such a query here who gives me only blt275162870bd76359 and blt275162870bd763523 in the response.(my elasticsearch version 6.5.4)
Please help me with this?

I don't think I understand.

blt275162870bd76355 is matching on:

  • "type_uid":"java"
  • "key":"blt9a73af140d639159"
  • publish_details.environment is "blt603fe91adbdcff66"
  • publish_details.locale in mr-in, en-us, hi-in

That's exactly what you asked here:

So I want the following result (whose "type_uid":"java" and "key":"blt9a73af140d639159" and publish_details.environment is "blt603fe91adbdcff66" and publish_details.locale in mr-in, en-us, hi-in )

Could you specify what is the query or pseudo code that should be applied?

If you are thinking of comparing documents together, then that's not possible as elasticsearch does not support joins.

Hi, I want to merge publish_details of documents if the uid of the document same then query on them. According to this here blt275162870bd76359 is repeated twice and if we merge it's publish_details then data looks like:

"publish_details":[{
 "environment": "blt603fe91adbdcff66",
 "locale": "en-us"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "hi-in"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "mr-in"
 },{
 "environment": "blt603fe91adbdcff67",
 "locale": "mr-in"
 }]

also for blt275162870bd763523 publish_details are:

"publish_details":[{
 "environment": "blt603fe91adbdcff66",
 "locale": "en-us"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "hi-in"
 },{
 "environment": "blt603fe91adbdcff66",
 "locale": "mr-in"
 }]

other documents have then own publish_details. so in this case, if I use the following query:

{
	"query": {
		"bool": {
			"must": [{
					"term": {
						"type_uid": "java"
					}
				}, {
					"term": {
						"key": "blt9a73af140d639159"
					}
				}, {
					"nested": {
						"path": "publish_details",
						"query": {
							"term": {
								"publish_details.environment": "blt603fe91adbdcff66"

							}
						}
					}
				},
				{
					"nested": {
						"path": "publish_details",
						"query": {
							"term": {
								"publish_details.locale": "en-us"

							}
						}
					}
				},
				{
					"nested": {
						"path": "publish_details",
						"query": {
							"term": {
								"publish_details.locale": "hi-in"

							}
						}
					}
				}, {
					"nested": {
						"path": "publish_details",
						"query": {
							"term": {
								"publish_details.locale": "mr-in"

							}
						}
					}
				}
			]
		}
	}
}

Then it is giving me on blt275162870bd763523 and blt275162870bd76359(repeated and we merged its publish_details) other documents did not match this query so they failed. so I want only these two documents in response.

So is any way in elastic search two combine documents(if uid repeated) and query them two get desire result here? Want one single query for this?Kindly Help me here.

I don't think it's possible at query time or in your application layer may be.

Or maybe change the way you're modeling your data if it makes sense.

It means there is no direct query to get a expected result?