relevant part of my mapping is here
{
"mappings": {
"message": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"sourceId": {
"type": "string",
"index": "not_analyzed"
},
"sourceName": {
"type": "string",
"index": "analyzed"
},
"text": {
"type": "string",
"store": true
},
"creationTimestamp": {
"type": "integer",
"index": "not_analyzed"
},
"histories": {
"type": "nested",
"properties": {
"views": {
"type": "integer",
"index": "not_analyzed"
},
"timestamp": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
},
"source": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"store": true
},
"membersCount": {
"type": "long",
"index": "not_analyzed"
}
}
}
}
}
about mapping
sourceId
and sourceName
in message
is id
and name
in source
.
each source can have a lots of messages and most fields of data (my focus is membersCount
field in here) has been updated periodically. so I think it's best mapping for my case for implementing relation between data (rather than parent-child approach)
what I want to achieve is sorting sources based on custom scoring by this formula:
scoreOfSource = (sumOfViewsOfMessagesBelongToCurrentSourceInAPeriod) / membersCount
to achieve this, I created below query on message
type
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"range": {
"creationTimestamp": {
"gte": 1475325259,
"lte": 1476102877
}
}
},
{
"match": {
"text": {
"query": "car",
"fuzziness": 0
}
}
}
]
}
},
"script_score": {
"script": "_source.histories[_source.histories.values.size()-1].views"
},
"boost_mode": "replace"
}
},
"aggs": {
"sources": {
"terms": {
"field": "sourceId",
"order": {
"myscore": "desc"
},
"size": 100
},
"aggs": {
"myscore": {
"sum": {
"script": "_score"
}
}
}
}
},
"size": 0
}
by the aggregation I can get sum of total views of each source's messages in myscore
but I don't know how can I access to membersCount
of each source because for Division. it is a field of source
type and I searching on messages. as I said, membersCount has been updated periodically. so it's not efficient to store the field in message
type.
is that a way to access the field in aggregation? for example using a subquery on another type in within of aggregation or any other solution.