Searching sentence/words with different order/position

Hi!

Is there a way to search a sentence or words and the result will be same number of words but they are in different order, I know my question is confusing so I will just give an example:

if I search for a phrase flowing water I need the results to be water flowing or flowing water and if there are three words in a phrase like a wonder boy results that I need are boy wonder a, a boy wonder, boy a wonder, wonder a boy.

How to do this on Elasticsearch?

Any help will be appreciated!

A match query or a query string query or a simple query string query will do that but without taking into account the number of words.

Hi @dadoonet!

Thank you for you response :slight_smile:

So there is no way to have a result with same exact number of jumbled words?

If not, is there a way to sort them by number of words or by relevance? e.g documents with most of the words will come first then the last one will only contain one word that matches.

Was trying to think about it.
Let say you know the word count at index time (and search time), you can do that:

DELETE test 
PUT test/_doc/1
{
  "text": "one two three",
  "wc": 3
}
PUT test/_doc/2
{
  "text": "one two three four",
  "wc": 4
}
GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "wc": {
              "value": 3
            }
          }
        },
        {
          "match": {
            "text": "one three two"
          }
        }
      ]
    }
  }
}

The question is how to get that count. One way is to parse the text before it gets indexed:

GET _analyze
{
  "text" : "one two three"
}

This gives:

{
  "tokens" : [
    {
      "token" : "one",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "two",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "three",
      "start_offset" : 8,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

So if you take the highest position and add +1, that will give you the number of tokens.

Would that work for you?

May be at index time, you can run an ingest script processor to count this?

1 Like

@dadoonet

Thank you for your time, I will try to do your suggestion and see where I can get to.

By the way, I have millions of records to index and the task that I needed to achieve is to get the exact words that is searched even though it is jumbled just like the example that I gave on my question. I think my other concern now is, if I proceed on what you suggest, the indexing will get a bit slow since I am using logstash to index the records. (PostgreSQL JDBC)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.