Hi
I am trying to use custom similarities.
I wish to have a very simple similarity, where the score is just doc.freq. As I understand this is stored in the statistics, so it should be very fast.
When having ~30mill docs, where ~3mill docs matches, the request is taking 1-3 seconds.
I dont understand why this is so slow, since the similarity function in my head is as simple,- if not more simple, than ie the default build in BM25 similarity.
I hope someone can help me understand this problem better, as I feel I am missing some understanding on, how this works.
Thanks in advance
Jens
MAPPINGS
{
"mappings": {
"dynamic": "strict",
"properties": {
"field_1":{
"similarity": "custom_similarity",
"type": "text"
},
"field_2":{
"similarity": "custom_similarity",
"type": "text"
},
"field_3":{
"similarity": "custom_similarity",
"type": "text"
}
}
},
"settings": {
"index": {
"number_of_replicas": "0",
"number_of_shards": "12",
"refresh_interval": "30s",
"similarity": {
"custom_similarity": {
"script": {
"source": "return doc.freq;"
},
"type": "scripted"
}
}
}
}
}
QUERY
{
"from": 0,
"size": 15,
"query": {
"bool": {
"minimum_should_match": 2,
"should": [
{
"bool": {
"should": [
{
"term": {
"field_1": {
"value": "592705521550"
}
}
},
{
"term": {
"field_2": {
"value": "592705521550"
}
}
},
{
"term": {
"field_3": {
"value": "592705521550"
}
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"field_1": {
"value": "618475336552"
}
}
},
{
"term": {
"field_2": {
"value": "618475336552"
}
}
},
{
"term": {
"field_3": {
"value": "618475336552"
}
}
}
]
}
},
... in total 15 of these should clauses with 3 term clauses in them
]
}
}
]
}
},
"_source": false,
"track_total_hits": 2147483647
}
RESPONSE
{
"took": 1665,
"timed_out": false,
...