I have 3 big indices (logs_2016, logs_2017 and logs_2018). They are about 200gb size and 200000000 docs count. I'm doing 99% of my queries by parameter user_id. Here's the mapping for logs_2017 e.g.:
{
"logs_2017": {
"mappings": {
"log_item": {
"_routing": {
"required": true
},
"properties": {
"@timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"app_id": {
"type": "string",
"index": "not_analyzed"
},
"user_id": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string"
},
"type_app": {
"type": "long"
},
"os": {
"type": "string",
"index": "not_analyzed"
},
"session_id": {
"type": "string",
"index": "not_analyzed"
.. .
}
}
}
}
}
}
How I query (getting all logs for specific event_id and device_type, ordering by timestamp and session_id):
GET logs_2016,logs_2017,logs_2018/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"user_id": "32232699237"
}
},
{
"match": {
"device_type": "31"
}
}
]
}
},
"size":100,
"from":0,
"sort":[{"timestamp":{"order":"desc"}}, {"session_id":{"order":"desc"}}]
}
My results are very slow, 2-4 seconds. How can I improve query results, should I put more indices with the same last event_id number e.g. or maybe monthly indices (12 per year, but then I would do searching over 36 indices for 3 years)?
Running ES on 2 client nodes (4 CPU, 6GB RAM, 2GB HDD), 2 data nodes (8 CPU, 16GB RAM, 400GB HDD).
While doing logs template i put 16 number of shards and 1 number of replicas.
Ty