Searching sentence/words with different order/position

Kevin_Juliano · March 12, 2020, 12:25pm

Hi!

Is there a way to search a sentence or words and the result will be same number of words but they are in different order, I know my question is confusing so I will just give an example:

if I search for a phrase flowing water I need the results to be water flowing or flowing water and if there are three words in a phrase like a wonder boy results that I need are boy wonder a, a boy wonder, boy a wonder, wonder a boy.

How to do this on Elasticsearch?

Any help will be appreciated!

dadoonet · March 12, 2020, 12:55pm

A match query or a query string query or a simple query string query will do that but without taking into account the number of words.

Kevin_Juliano · March 12, 2020, 1:06pm

Hi @dadoonet!

Thank you for you response

So there is no way to have a result with same exact number of jumbled words?

If not, is there a way to sort them by number of words or by relevance? e.g documents with most of the words will come first then the last one will only contain one word that matches.

dadoonet · March 12, 2020, 1:20pm

Was trying to think about it.
Let say you know the word count at index time (and search time), you can do that:

DELETE test 
PUT test/_doc/1
{
  "text": "one two three",
  "wc": 3
}
PUT test/_doc/2
{
  "text": "one two three four",
  "wc": 4
}
GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "wc": {
              "value": 3
            }
          }
        },
        {
          "match": {
            "text": "one three two"
          }
        }
      ]
    }
  }
}

The question is how to get that count. One way is to parse the text before it gets indexed:

GET _analyze
{
  "text" : "one two three"
}

This gives:

{
  "tokens" : [
    {
      "token" : "one",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "two",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "three",
      "start_offset" : 8,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

So if you take the highest position and add +1, that will give you the number of tokens.

Would that work for you?

May be at index time, you can run an ingest script processor to count this?

Kevin_Juliano · March 12, 2020, 1:32pm

@dadoonet

Thank you for your time, I will try to do your suggestion and see where I can get to.

By the way, I have millions of records to index and the task that I needed to achieve is to get the exact words that is searched even though it is jumbled just like the example that I gave on my question. I think my other concern now is, if I proceed on what you suggest, the indexing will get a bit slow since I am using logstash to index the records. (PostgreSQL JDBC)

system · April 9, 2020, 1:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch query result order based on matching position Elasticsearch	2	450	September 27, 2019
Finding a sentence in elasticsearch-php Elasticsearch	2	689	August 31, 2018
Search by word order Elasticsearch	2	303	February 11, 2022
Help me modify this query so that it can find out the matches which contain N terms in a specific order as opposed to all matches possible which contain the N terms Elasticsearch	9	663	November 11, 2019
Search Order of term in query Elasticsearch	2	314	July 6, 2017

Searching sentence/words with different order/position

Related topics