Aggregation of nearby dense vectors

JulGor · July 22, 2020, 9:37am

Hello,

Is there any aggregation that group documents in terms of closeness of dense vectors?

My documents have the following structure:

{
  "label": "value",
  "vector": [1, 0, 3, 0, 0, 0, 18, 0, 0, 0, ...]  # N dimensions
}

What I want is to find clusters of documents whose vectors are "close" in a N-dimensional space.

I've been looking into the forum and the documentation and found a "Variable Width Histogram Aggregation" (https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-aggregations-bucket-variablewidthhistogram-aggregation.html) that looks like it does something similar to what I want, but it is not available in versions 7.7-7.8.

I should also highlight that I structured the information in this way because I think it should be easier for applying clustering from a ML perspective, but I could use any other data scheme. My final goal is to group IP addresses in different clusters depending on the attack they are performing.

What I actually have is a bunch of events from my firewalls with an origin IP address that is performing an attack and the kind of alert it triggered. What I did is grouping all the events triggered by each IP address in a single document (using a transform) and count the number of events that IP address generated per type of alert, summarizing them into a dense vector.

So, each dimension in the final vector is the number of events of 1 certain type that 1 IP address generated. Obviously, the same position in different vectors represents exactly the same type of alert.

Then, I applyed clustering algorithms, such as DBSCAN, over these vectors outside Elasticsearch and the result was that I got groups of IP addresses that were performing the same kind of attacks. My next step would be to do exactly the same but inside Elasticsearch.

Is that possible? Would it be another (and better) way to do it?

Thanks in advance!!

system · August 19, 2020, 9:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Averaging dense_vectors within ES Elasticsearch	1	362	December 15, 2020
Getting aggregate statistics about a dense_vector field Elasticsearch	5	1369	November 4, 2022
Searching with Dense Vector Elasticsearch	4	3416	January 31, 2020
ALgorithm in ElasticSearch for similarity distances between 2 floating vectors Elasticsearch	2	582	February 12, 2021
Is there any way we can use list of vectors to store in ElasticSearch and what are the corresponding changes required in ES query for calculating cosine similarity Elasticsearch	2	355	June 28, 2021

Aggregation of nearby dense vectors

Related topics