Percentage after aggregation

Hello,
i'm running a ssh server and i would like to keep an eye on users that are trying to connect to my server using wrong passwords.

available logs are : ( OK/KO )

Oct 16 23:34:40 xxxxxxx xxxxxxx[26557]: User toto@tata.fr from 127.0.0.7 authentified
Oct 17 01:53:17 xxxxxxx xxxxxxx[info] 29731#0: *322809 client login failed127.0.0.0.8, login: "titi@tutu.fr"

And i would like to know the percentage of failure per login,

I tried this aggregation :slight_smile:

GET result-2016.10.16/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1473334483178,
              "lte": 1476698400000,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "status_failed": {
      "filter": {
        "term": {
          "status": "failed"
        }
      },
      "aggs": {
        "nb_docs_per_account": {
          "terms": {
            "field": "login",
            "min_doc_count": 20,
            "size": 5,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    },
    "status_ok": {
      "filter": {
        "term": {
          "status": "authentified"
        }
      },
      "aggs": {
        "nb_docs_par_login": {
          "terms": {
            "field": "login",
            "min_doc_count": 700,
            "size": 5,
            "order": {
              "_count": "desc"
            }
          }
        }
      }
    }
   }
}

I am therefore able to know the number of failure AND success per login:

  "took": 91,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 180838,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "status_ok": {
      "doc_count": 174779,
      "nb_docs_par_login": {
        "doc_count_error_upper_bound": 264,
        "sum_other_doc_count": 165737,
        "buckets": [
          {
            "key": "blabbla@blabla",
            "doc_count": 1248
          },
          {
            "key": "bibi@bobo",
            "doc_count": 1002
          }
        ]
      }
    },
    "status_failed": {
      "doc_count": 6059,
      "nb_docs_par_login": {
        "doc_count_error_upper_bound": 27,
        "sum_other_doc_count": 5402,
        "buckets": [
          {
            "key": "coucou@toto",
            "doc_count": 162
          }
        ]
      }
    }
  }
} 

However, is there a way to "join" the 2 aggreagtions using the key "login" and compute the percentage of failure ? ( ie number of failed / (number of sucess + number of failed) per login and sort it ?

Thanks

Christophe

It is possible to use get the blend of success vs failure using significant_terms agg to find users skewed towards failure and report on the mix:

DELETE test
// Repeat lots of times with mostly false..
POST test/event
{
	"user": 1,
	"success":false
}
// Repeat lots of times with mostly true..
POST test/event
{
	"user": 2,
	"success":true
}
// Run this to contrast
GET test/event/_search
{
	"query":{
		"match":{
			"success":false
		}
	},
	"aggs":{
		"percent":{
			"significant_terms":{
				"field":"user",
				"shard_min_doc_count":1,
				"min_doc_count":1,
				"percentage":{}
			}
		}
	}
}

However, this form of analysis is hampered if the data is distributed (multiple shards or time-based indices) and can be slow to run. A more effective general strategy for examining user behaviours is to adopt an entity-centric indexing approach - see https://www.youtube.com/watch?v=yBf7oeJKH2Y

Thanks for your help. Indeed, the significat_terms seems to work.
However, i aso need to filter on this percentage. That is why i changed the way to create an aggregation:

GET result-2016.10.16/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1473334483178,
              "lte": 1476698400000,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "login": {
      "terms": {
        "field": "login",
        "min_doc_count": 20,
        "size": 4,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "total_hits": {
          "sum": {
            "field": "count_value"
          }
        },
        "status_failed": {
          "filter": {
            "term": {
              "status": "failed"
            }
          },
          "aggs": {
            "total_hits_failed": {
              "sum": {
                "field": "count_value"
              }
            }
          }
        },
        "ComputePercentage": {
          "bucket_script": {
            "buckets_path": {
              "nb_total": "total_hits",
              "nb_failed": "status_failed>total_hits_failed"
            },
            "script": "100 * nb_failed / nb_total "
          }
        }
      }
    }
  }
}

i created:

1rst aggs:

  • Filter by login

2nd aggs:

  • Total_hits ( i added a field count_value always equal to 1 for earch document, since i cannot use doc_count. So the agg is there to compute the sum.
  • Status failed.

The last bucket_script is there to calculate the percentage, but the answer is:

{
      "error": {
        "root_cause": [],
        "type": "reduce_search_phase_exception",
        "reason": "[reduce] ",
        "phase": "fetch",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
          "type": "script_exception",
          "reason": "compile error",
          "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "Variable [nb_failed] is not defined."
          },
          "script_stack": [
            "100 * nb_failed / nb_total ",
            "      ^---- HERE"
          ],
          "script": "100 * nb_failed / nb_total ",
          "lang": "painless"
        }
      },
      "status": 503
    }

What's wrong with this bucket_script? the variable nb_failed is clearly defined .Thanks.

Update:

I'm trying this kind of aggregation:

GET proxy_ssl-2016.10.16/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "analyze_wildcard": true,
            "query": "*"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1473334483178,
              "lte": 1476698400000,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "login": {
      "terms": {
        "field": "login",
        "min_doc_count": 20,
        "size": 4,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "failed_hits": {
          "filter": {
            "term": {
              "status": "failed"
            }
          }
        },
        "successful_hits": {
          "filter": {
            "term": {
              "status": "authentified"
            }
          }
        },
        "ComputePercentage": {
          "bucket_script": {
            "buckets_path": {
              "NB_OK": "successful_hits.doc_count",
              "NB_KO": "failed_hits_hits.doc_count"
            },
            "script": "100 * NB_OK / ( NB_OK + NB_KO )"
          }
        }
      }
    }
  }
} 

But the result is always an error:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No aggregation found for path [failed_hits_hits.doc_count]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "proxy_ssl-2016.10.16",
        "node": "Fb9p4UCRSI-xhV_u6EcQTQ",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "No aggregation found for path [failed_hits_hits.doc_count]"
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "No aggregation found for path [failed_hits_hits.doc_count]"
    }
  },
  "status": 400
}

Is there a good way to call these doc_count variables is a bucket_script ?

Thanks

Before we tackle the question of making your script work can we confirm that you are happy only doing this analysis at a small scale?
If you have many IDs, many events and time-based indices or multiple shards you should be concerned. This style of aggregation does not scale for the reasons I outlined in the video link I sent (joining of multiple records on a high cardinality key in a distributed system).
The solution I advocated of an entity-centric index would be a way of maintaining a fast and scalable response to questions of this sort. If you want a slow but scalable way of answering this question consider using the scan/scroll api and sorting on user ID. The example entity-centric scripts I provide do exactly this but remember the risk-ratings in a user-profile document so that you only need to query for the new events the next time you run rather than re-evaluating all history again.

Cheers
Mark

Hello,
Indeed, for this POC, i only have to deal with a small amout of data. That is why this agg works as expected.

However, after seeing your youtube video, i did some tests with ~ 1 000 000 users and ~ 2500 cnx ( failed or not) IP per sec, and the aggregation is now very slow.

The entity centric approach you advised me seems to be the solution.

I need to be able to compute that kind of statictics:

  • Number of unique IP per user for the last X minutes
  • Number of failed login ( and therefore wrong password... ) per login for the last X minutes
    etc.

Of couse all these statictics use high cardinality... :slight_smile:

With the entity centric model, i will have to create an index per login, and each new log line will be a new document in that index. by this way, i do not need to use cardinality on login field.

However, i do not want to create ~ 1 million indexes because of all the different customers i'm dealing with. what are the best practises to store data ?

The final aim of this project is to use elastalert to blacklist login or ip which are suspicious... :confused: