Match query with synonym token filter where synonym available on first & second paragraph only

Hi everyone ! I'm new on elasticsearch things...

I have an index with settings like this :

{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "article_analyzer": {
                        "filter": [
                            "synonym",
                            "stop"
                        ],
                        "tokenizer": "whitespace"
                    }
                },
                "filter": {
                    "synonym": {
                        "type": "synonym",
                        "synonyms_path": "/etc/elasticsearch/synonyms.txt"
                    },
                    "stop": {
                        "ignore_case": "true",
                        "type": "stop",
                        "stopwords_path": "/etc/elasticsearch/stopwords.txt"
                    }
                }
            }
        }
    }
}

and the /etc/elasticsearch/synonyms.txt mapping is something like this :

hiv => Human immunodeficiency virus

and my current query is like this :

{
    "query": {
        "multi_match": {
            "query": "hiv",
            "analyzer": "article_analyzer",
            "fields": [
                "content",
                "title",
                "slug"
            ]
        }
    }
}

The result will be something like :

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 14.4982815,
        "hits": [
            {
                "_index": "article",
                "_type": "article",
                "_id": "732",
                "_score": 14.4982815,
                "_source": {
                    "slug": "some-slug",
                    "title": "Some title",
                    "content": "<p>Human immunodeficiency virus is lorem ipsum dolor sit amet</p>\n\n<p>some lorem ipsum dolor sit amet Human immunodeficiency virus</p>\n",
                    "url": "https://example.tld/some-slug"
                }
            },
            {
                "_index": "article",
                "_type": "article",
                "_id": "704",
                "_score": 13.077797,
                "_source": {
                    "slug": "some-some-slug",
                    "title": "Some some slug",
                    "content": "<p>Lorem ipsum dolor sit amet.</p>\n\n<p>Aliquam id purus mi. Suspendisse vitae aliquet velit.</p>\n\n<p>Human immunodeficiency virus is something.</p>",
                    "url": "https://example.tld/some-some-slug"
                }
            }
        ]
    }
}

the result that I wanted is only document with _id : 732 because synonym matched in first html tag p and the second one.
and the document with _id : 704 is matched synonym but started in third html tag p and should'nt be appread in the result....

I know there is some much ways to solved this problem, but I just wanted to know if this is possible to solve in the elasticsearch way without update the document structure

If the paragraphs in your input document have some specific meaning (e.g. you want some to match a search but not others) you need to split the document and index it into different fields. That way you can query only fields that you want to match.

1 Like

Exactly, I already think about it.....

So the only way to solve this problem is split it into different fields.

Hmmm, okayyyy thanks for your reply.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.