Hi to all. I'm working on my university project, where I use Elasticsearch to find similar texts using a "match" query.
Here is an example of one document from my dataset:
""_ident": "5/425/2020",
"versions": [
{
"version": "20250506",
"headlines": [
{},
{
"paragraf_id": "predpis.nadpis",
"title": "o štátnom rozpočte na rok 2021"
}
],
"text": " Podnikateľ Horskej záchrannej služby, policajtov, príslušníkov finančnej správy a profesionálnych vojakov.14)"
}
]"
This is the mapping i use:
PUT /test_index
{
"mappings": {
"properties": {
"_ident": {
"type": "keyword"
},
"versions": {
"type": "nested",
"properties": {
"version": {
"type": "integer"
},
"headlines": {
"type": "nested",
"properties": {
"paragraf_id": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
},
"text": {
"type": "text"
}
}
}
}
}
}
I need to make a query that will find all similar documents by the "text" field, and also return only those documents in which the "version" field will be less than the number I specified, and also, if, for example, two or more found documents have the same value in the "field" _ident" then the answer will contain only one of these documents with the largest "version" value and at the same time less than the specified number. Note (the number in the "version" field is the date of adoption of the law, but it is written in INT format).
For greater clarity, I will give a practical example.
Here I have four documents:
{
"_ident": "5/425/2020",
"versions": [
{
"version": "20210506",
"headlines": [
{},
{
"paragraf_id": "predpis.nadpis",
"title": "o štátnom rozpočte na rok 2021"
}
],
"text": "Sixty-Four comes asking for bread."
}
]
}
{
"_ident": "5/425/2020",
"versions": [
{
"version": "20190506",
"headlines": [
{},
{
"paragraf_id": "predpis.nadpis",
"title": "o štátnom rozpočte na rok 2021"
}
],
"text": "Sixty-Four comes asking for bread."
}
]
}
{
"_ident": "5/425/2020",
"versions": [
{
"version": "20170406",
"headlines": [
{},
{
"paragraf_id": "predpis.nadpis",
"title": "o štátnom rozpočte na rok 2021"
}
],
"text": "Sixty-Four comes asking for bread."
}
]
}
{
"_ident": "5/7/2023",
"versions": [
{
"version": "20170406",
"headlines": [
{},
{
"paragraf_id": "predpis.nadpis",
"title": "o štátnom rozpočte na rok 2021"
}
],
"text": "Sixty-Four comes asking for bread."
}
]
}
And I want to get documents that are matched with word "comes" and their version less than "20210506": so responce should contain document with: "_ident": "5/425/2020","version": "20190506" and document with: "_ident": "5/7/2023", "version": "20170406".
I made this query, but it doesn't work:
GET /test_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "versions",
"query": {
"match": {
"versions.text": "comes"
}
}
}
},
{
"script": {
"script": {
"source": """
def threshold = params.threshold;
def maxVersion = doc['versions.version'].size() > 0 ? doc['versions.version'].value : 0;
return maxVersion < threshold;
""",
"params": {
"threshold": 20210101
}
}
}
}
]
}
}
}
Could you help me please to make a query that will work?