Hello @Aurel_Drejta
First of all, you are going to search on the content
field.
Thanks to your query, you will obtain N (size
) hits
from the index or indices you're querying on.
Now, you are requesting to get the documents having distinct url
values.
First of all, the drawback of your mapping is that url
is analyzed as text
.
If you can reindex the data using the following mapping (or use keyword
instead of text
for the field url
):
PUT discussunique?include_type_name=false
{
"mappings": {
"properties": {
"author": {
"type": "text"
},
"content": {
"type": "text"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
# Demo data
PUT discussunique/_doc/1
{
"author": "author1",
"content": "Some text i want to search",
"url": "https://elastic.co"
}
PUT discussunique/_doc/2
{
"author": "author2",
"content": "Some text i want to search",
"url": "https://elastic.co"
}
PUT discussunique/_doc/3
{
"author": "author3",
"content": "Some text i want to search",
"url": "https://discuss.elastic.co"
}
You will be able to get unique values using collapse
:
GET discussunique/_search
{
"query": {
"match": {
"content": "Some text i want to search"
}
},
"collapse": {
"field": "url.keyword"
}
}
# Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "discussunique",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.80118835,
"_source" : {
"author" : "author1",
"content" : "Some text i want to search",
"url" : "https://elastic.co"
},
"fields" : {
"url.keyword" : [
"https://elastic.co"
]
}
},
{
"_index" : "discussunique",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.80118835,
"_source" : {
"author" : "author3",
"content" : "Some text i want to search",
"url" : "https://discuss.elastic.co"
},
"fields" : {
"url.keyword" : [
"https://discuss.elastic.co"
]
}
}
]
}
}
Or you can use a terms
aggregation. If you expect a lot of results, you should use a composite aggregation
with a terms
sub aggregation.
If you want to get the most occurring unique values, you can use terms
# All unique values for `url`
GET discussunique/_search
{
"query": {
"match": {
"content": "Some text i want to search"
}
},
"size": 0,
"aggs": {
"urls": {
"composite": {
"sources": [
{
"urls": {
"terms": {
"field": "url.keyword"
}
}
}
]
}
}
}
}
# Top 100 unique values for `url`
GET discussunique/_search
{
"query": {
"match": {
"content": "Some text i want to search"
}
},
"size": 0,
"aggs": {
"urls": {
"terms": {
"field": "url.keyword",
"size": 100
}
}
}
}
If you're on a Basic
license, you can try out the SQL API:
POST _sql?format=txt
{
"query": """SELECT url FROM discussunique WHERE QUERY('content:"Some text i want to search"') GROUP by url"""
}
# If you want to see the derived Elasticsearch DSL
POST _sql/translate
{
"query": """SELECT url FROM discussunique WHERE QUERY('content:"Some text i want to search"') GROUP by url"""
}
If you cannot modify the mappings, there are no options except using a scripted field, but it will be REALLY slow and memory consuming because you will be accessing the _source
of the documents each time. The following ones are not recommended.
GET discussunique/_search
{
"query": {
"match": {
"content": "Some text i want to search"
}
},
"size": 0,
"aggs": {
"urls": {
"composite": {
"sources": [
{
"urls": {
"terms": {
"script": {
"source": "params['_source'].url",
"lang": "painless"
}
}
}
}
]
}
}
}
}
# Or...
GET discussunique/_search
{
"query": {
"match": {
"content": "Some text i want to search"
}
},
"size": 0,
"aggs": {
"urls": {
"terms": {
"script": {
"source": "params['_source'].url",
"lang": "painless"
},
"size": 100
}
}
}
}