Hi everyone,
I have a working query that I would like to discuss to see if there are ways to make it both smaller and more efficient.
I have a mapping that has a similar structure to this one:
{
"documents": {
"properties": {
"content": {
"type": "string"
},
"id": {
"type": "string",
"index": "not_analyzed"
},
"user" : {
"type" : "object",
"properties" : {
"id" : {
"index" : "not_analyzed",
"type" : "string"
},
"fields": {
"include_in_parent" : true,
"type" : "nested",
"properties" : {
"key" : {
"type" : "string",
"index" : "not_analyzed"
},
"value" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
}
}
}
As you can see, user
is nested and has itself a nested object:fields
.
Both include_in_parent
but I just added it here in case some solution can actually leverage it as for my current query I'm using nested
.
So what I was trying to get was a document where it's user has both key: "name", value: "john"
and key: "segment", value: "active"
.
My first approach was to use a nested bool:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"id": "foo"
}
},
{
"nested": {
"path": "user.fields",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"term": {
"user.fields.key": "name"
}
},
{
"term": {
"user.fields.value": "john"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"user.fields.key": "segment"
}
},
{
"term": {
"user.fields.value": "active"
}
}
]
}
}
]
}
}
}
}
}
}
]
}
}
}
}
}
But it didn't work.
My hope was that it performed something like:
(user.fields.key:"name" AND user.fields.value:"john") AND (user.fields.key:"segment" AND user.fields.value:"string")
but since it does not work I assume it is actually matching something like:
user.fields.key:"name" AND user.fields.value:"john" AND user.fields.key:"segment" AND user.fields.value:"string"
Which would be impossible to match any document since there's only one key/value
for each object on fields
(hope this makes sense).
So currently we have this working query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"id": "foo"
}
},
{
"nested": {
"path": "user.fields",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"user.fields.key": "name"
}
},
{
"term": {
"user.fields.value": "john"
}
}
]
}
}
}
}
}
},
{
"nested": {
"path": "user.fields",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"user.fields.key": "segment"
}
},
{
"term": {
"user.fields.value": "active"
}
}
]
}
}
}
}
}
}
]
}
}
}
}
}
But we are concerned about the size of the query because:
-
Our users will be able to query as much user.fields as they want so the query will grow indefinitely
-
There's an overhead on the rest bit (not a huge problem, but still a point)
-
is the first a query we write really doesn't feel right - looks like too much boiler plate and non-intuitive in a way.
Is there another approach to this problem that we are totally missing here?
(Sorry for the long post)
Thanks