Percentage after aggregation

Christophe_Journel · October 17, 2016, 9:47am

Hello,
i'm running a ssh server and i would like to keep an eye on users that are trying to connect to my server using wrong passwords.

available logs are : ( OK/KO )

Oct 16 23:34:40 xxxxxxx xxxxxxx[26557]: User toto@tata.fr from 127.0.0.7 authentified
Oct 17 01:53:17 xxxxxxx xxxxxxx[info] 29731#0: *322809 client login failed127.0.0.0.8, login: "titi@tutu.fr"

And i would like to know the percentage of failure per login,

I tried this aggregation

GET result-2016.10.16/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1473334483178,
              "lte": 1476698400000,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "status_failed": {
      "filter": {
        "term": {
          "status": "failed"
        }
      },
      "aggs": {
        "nb_docs_per_account": {
          "terms": {
            "field": "login",
            "min_doc_count": 20,
            "size": 5,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    },
    "status_ok": {
      "filter": {
        "term": {
          "status": "authentified"
        }
      },
      "aggs": {
        "nb_docs_par_login": {
          "terms": {
            "field": "login",
            "min_doc_count": 700,
            "size": 5,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    }
   }
}

I am therefore able to know the number of failure AND success per login:

  "took": 91,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 180838,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "status_ok": {
      "doc_count": 174779,
      "nb_docs_par_login": {
        "doc_count_error_upper_bound": 264,
        "sum_other_doc_count": 165737,
        "buckets": [
          {
            "key": "blabbla@blabla",
            "doc_count": 1248
          },
          {
            "key": "bibi@bobo",
            "doc_count": 1002
          }
        ]
      }
    },
    "status_failed": {
      "doc_count": 6059,
      "nb_docs_par_login": {
        "doc_count_error_upper_bound": 27,
        "sum_other_doc_count": 5402,
        "buckets": [
          {
            "key": "coucou@toto",
            "doc_count": 162
          }
        ]
      }
    }
  }
}

However, is there a way to "join" the 2 aggreagtions using the key "login" and compute the percentage of failure ? ( ie number of failed / (number of sucess + number of failed) per login and sort it ?

Thanks

Christophe

Mark_Harwood · October 17, 2016, 12:26pm

It is possible to use get the blend of success vs failure using significant_terms agg to find users skewed towards failure and report on the mix:

DELETE test
// Repeat lots of times with mostly false..
POST test/event
{
	"user": 1,
	"success":false
}
// Repeat lots of times with mostly true..
POST test/event
{
	"user": 2,
	"success":true
}
// Run this to contrast
GET test/event/_search
{
	"query":{
		"match":{
			"success":false
		}
	},
	"aggs":{
		"percent":{
			"significant_terms":{
				"field":"user",
				"shard_min_doc_count":1,
				"min_doc_count":1,
				"percentage":{}
			}
		}
	}
}

However, this form of analysis is hampered if the data is distributed (multiple shards or time-based indices) and can be slow to run. A more effective general strategy for examining user behaviours is to adopt an entity-centric indexing approach - see https://www.youtube.com/watch?v=yBf7oeJKH2Y

Christophe_Journel · October 17, 2016, 3:15pm

Thanks for your help. Indeed, the significat_terms seems to work.
However, i aso need to filter on this percentage. That is why i changed the way to create an aggregation:

GET result-2016.10.16/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1473334483178,
              "lte": 1476698400000,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "login": {
      "terms": {
        "field": "login",
        "min_doc_count": 20,
        "size": 4,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "total_hits": {
          "sum": {
            "field": "count_value"
          }
        },
        "status_failed": {
          "filter": {
            "term": {
              "status": "failed"
            }
          },
          "aggs": {
            "total_hits_failed": {
              "sum": {
                "field": "count_value"
              }
            }
          }
        },
        "ComputePercentage": {
          "bucket_script": {
            "buckets_path": {
              "nb_total": "total_hits",
              "nb_failed": "status_failed>total_hits_failed"
            },
            "script": "100 * nb_failed / nb_total "
          }
        }
      }
    }
  }
}

i created:

1rst aggs:

Filter by login

2nd aggs:

Total_hits ( i added a field count_value always equal to 1 for earch document, since i cannot use doc_count. So the agg is there to compute the sum.
Status failed.

The last bucket_script is there to calculate the percentage, but the answer is:

{
      "error": {
        "root_cause": [],
        "type": "reduce_search_phase_exception",
        "reason": "[reduce] ",
        "phase": "fetch",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
          "type": "script_exception",
          "reason": "compile error",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Variable [nb_failed] is not defined."
          },
          "script_stack": [
            "100 * nb_failed / nb_total ",
            "      ^---- HERE"
          ],
          "script": "100 * nb_failed / nb_total ",
          "lang": "painless"
        }
      },
      "status": 503
    }

What's wrong with this bucket_script? the variable nb_failed is clearly defined .Thanks.

Christophe_Journel · October 17, 2016, 4:08pm

Update:

I'm trying this kind of aggregation:

GET proxy_ssl-2016.10.16/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1473334483178,
              "lte": 1476698400000,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "login": {
      "terms": {
        "field": "login",
        "min_doc_count": 20,
        "size": 4,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "failed_hits": {
          "filter": {
            "term": {
              "status": "failed"
            }
          }
        },
        "successful_hits": {
          "filter": {
            "term": {
              "status": "authentified"
            }
          }
        },
        "ComputePercentage": {
          "bucket_script": {
            "buckets_path": {
              "NB_OK": "successful_hits.doc_count",
              "NB_KO": "failed_hits_hits.doc_count"
            },
            "script": "100 * NB_OK / ( NB_OK + NB_KO )"
          }
        }
      }
    }
  }
}

But the result is always an error:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No aggregation found for path [failed_hits_hits.doc_count]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "proxy_ssl-2016.10.16",
        "node": "Fb9p4UCRSI-xhV_u6EcQTQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "No aggregation found for path [failed_hits_hits.doc_count]"
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "No aggregation found for path [failed_hits_hits.doc_count]"
    }
  },
  "status": 400
}

Is there a good way to call these doc_count variables is a bucket_script ?

Thanks

Mark_Harwood · October 18, 2016, 7:30am

Before we tackle the question of making your script work can we confirm that you are happy only doing this analysis at a small scale?
If you have many IDs, many events and time-based indices or multiple shards you should be concerned. This style of aggregation does not scale for the reasons I outlined in the video link I sent (joining of multiple records on a high cardinality key in a distributed system).
The solution I advocated of an entity-centric index would be a way of maintaining a fast and scalable response to questions of this sort. If you want a slow but scalable way of answering this question consider using the scan/scroll api and sorting on user ID. The example entity-centric scripts I provide do exactly this but remember the risk-ratings in a user-profile document so that you only need to query for the new events the next time you run rather than re-evaluating all history again.

Cheers
Mark

Christophe_Journel · October 20, 2016, 1:04pm

Hello,
Indeed, for this POC, i only have to deal with a small amout of data. That is why this agg works as expected.

However, after seeing your youtube video, i did some tests with ~ 1 000 000 users and ~ 2500 cnx ( failed or not) IP per sec, and the aggregation is now very slow.

The entity centric approach you advised me seems to be the solution.

I need to be able to compute that kind of statictics:

Number of unique IP per user for the last X minutes
Number of failed login ( and therefore wrong password... ) per login for the last X minutes
etc.

Of couse all these statictics use high cardinality...

With the entity centric model, i will have to create an index per login, and each new log line will be a new document in that index. by this way, i do not need to use cardinality on login field.

However, i do not want to create ~ 1 million indexes because of all the different customers i'm dealing with. what are the best practises to store data ?

The final aim of this project is to use elastalert to blacklist login or ip which are suspicious...

Topic		Replies	Views
Combining two aggregations to get term percentage Elasticsearch	5	14304	July 6, 2017
Return Logstash Failed User logons by day and return code Elasticsearch	11	586	July 6, 2017
Use Aggreagation on Elastic1.0 RC to calculate porcentage of results Elasticsearch	2	392	July 6, 2017
Help with the percentiles aggregation Elasticsearch	5	570	July 6, 2017
Query to calculate login success and failure rates Elasticsearch	2	1215	May 24, 2021

Percentage after aggregation

Related topics