Hi !
My proxy logs provide "URL" and "user" fields.
I would like to discover all the common "URL" between 2 user.
Is it possible ?
Thank you,
Florent
Hi !
My proxy logs provide "URL" and "user" fields.
I would like to discover all the common "URL" between 2 user.
Is it possible ?
Thank you,
Florent
If you are talking about any pair of users this could be possible using a combination of the terms and cardinality aggregations but would require some tricks to scale if you have millions of unique urls and distributed indices/shards. Is that the case ?
If you are talking about a specific pair of users then a query for them with the terms aggregation on the url field and a cardinality agg on users should suffice.
Yes, I am talking about a specific pair of users.
For instance, what are the common "URL" between "source_login":user1 and "source_login":user2 ?
The problem is that the OR request ("source_login":user1 OR "source_login":user2) provide the UNION of accessed "URL".
But I don't know how I can get the INTERSECTION of accessed "URL" for these two users ?
Thank you for your help,
Regards,
Florent
Try this:
DELETE test
PUT test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc":{
"properties":{
"url":{
"type":"keyword"
},
"user":{
"type":"keyword"
}
}
}
}
}
POST test/_doc/_bulk
{"index":{}}
{"user":"user1", "url":"url1"}
{"index":{}}
{"user":"user1", "url":"url2"}
{"index":{}}
{"user":"user2", "url":"url2"}
{"index":{}}
{"user":"user2", "url":"url2"}
{"index":{}}
{"user":"user2", "url":"url3"}
{"index":{}}
{"user":"user3", "url":"url3"}
{"index":{}}
{"user":"user3", "url":"url4"}
GET test/_search
{
"query": {
"terms":{
"user":["user1", "user2"]
}
},
"size":0,
"aggs":{
"urls":{
"terms":{
"field":"url",
"min_doc_count": 2,
"order": {
"numUsers": "desc"
}
},
"aggs":{
"numUsers":{
"cardinality": {
"field": "user"
}
}
}
}
}
}
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.