Determine Top Keywords from the fetched list of documents

Mark_Harwood · October 2, 2019, 8:53am

If you're echoing back the popularity of what users typed I'd argue that's of limited value.
It's often more useful to give them things other than what they typed to open new lines of enquiry or refine the query.

Here's a real example of your query on news data using significant_text

Note useful phrases like "border agents" are indexed because I used an Analyzer that produces single-word and two-word "shingles".

Another useful technique is to extract names into a structured field using something like rosette or spacy and then use the annotated_text field type to allow drill-downs into the text to show where they were mentioned. Here for example we use significant_terms on the people field (to focus on the significant rather than the popular) and discover the CBP commissioner:

These people names or significant text like "wall" or "barrier" can then be typically added to the query

Noctis17 · October 3, 2019, 2:01am

Copy! Got it sir.

I'm getting more lenient to doc_count than the keyword frequency per document

My problem now is with the adjacency_matrix aggs, my keywords are on string data type. Since we're using Laravel (PHP-based framework) MySQL and Sphinx as the legacy database and search server, the keywords stored doesn't have similar patterns so I can't find a way to segregate the main keywords from the negated ones -- the ones in the NOT format of the query_string query (roughly we already have 25,000+ records now.

Going back to the adjacency_matrix aggs, here's my scenario:

I have this string as the user's keyword:

"(("Mayor Isko Moreno" OR "Mayor Vico Sotto") AND ("Manila" OR "Pasig"))"

With my attempt using the significant_text from our earlier replies, I have found a way to explode the string and convert it into an array

Now I have added it to the adjacency_matrix aggs

And the result:

The problem I'm seeing:
What I'm getting at, is that the adjacency_matrix looks like joining all the 4 significant words from my keywords as one -- considering all documents that all 4 of them are appearing, but logically speaking, the keyword is only choosing one from "Mayor Isko Moreno" OR "Mayor Vico Sotto" AND "Manila" OR "Pasig"

This concept was able to be reproduced by the explain feature of Elasticsearch search API

a sample screenshot of the search API response

then here's a sample screenshot of the contents of the explain object

And that's how we were able to display the keyword frequency PER HIT / DOCUMENT

The problem with explain is that it appears on hits-level --- meaning on every hit only

Is there a way or a feature similar to explain that is appearing in the same level as the body so I can have a summarized result?

P.S. this is happening within just the search API, didn't need any other calls or endpoints

Noctis17 · October 3, 2019, 7:09am

Latest R&D and update, my aggregation query now looks like this:

"aggs" => [
    "KEYWORDS" => [
        "filters" => [
            "filters" => [
                "term1" => [
                    "term" => [
                        'content' => "isko"
                    ]
                ],
                "term2" => [
                    "term" => [
                        'content' => "manila"
                    ]
                ]
            ]
        ]
    ]
]

and here's the response:

I'm almost near to my expected output, however when I wanted the right phrase / keywords, it is not being recognized by Elasticsearch

"aggs" => [
    "KEYWORDS" => [
        "filters" => [
            "filters" => [
                "term1" => [
                    "term" => [
                        'content' => "mayor isko moreno"
                    ]
                ],
                "term2" => [
                    "term" => [
                        'content' => "mayor vico sotto"
                    ]
                ]
            ]
        ]
    ]
]

response

Does this mean Elasticsearch can't recognize phrases?

system · October 31, 2019, 7:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Count number of keyword occurrences with query_string Elasticsearch	2	2553	August 9, 2019
Count the number of occurence of my keywords from all the retrieved documents Elasticsearch	1	1207	September 27, 2019
Query Elasticsearch to get count of Keywords within matched Keywords Elasticsearch	1	366	March 4, 2020
Hit count stats for search results Elasticsearch	3	543	July 5, 2017
Count the occurrence of words in ElasticSearch Elasticsearch elastic-stack-monitoring , elastic-stack-alerting , docker	5	3280	January 11, 2022

Determine Top Keywords from the fetched list of documents

Related topics