Elastic search Indexed content retruns nothing with title should query


(SatyaRaj) #1

Hi Team,

I've indexed a content in to my elastic search v5.6.0 in local through json.
Now I can query this using POST http://localhost:9200/content/_search with empty body, and the search returns all results indexed in my local.

{
  "took": 87,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "content",
        "_type": "content",
        "_id": "5a9d1992b2067a8a52da23d2",
        "_score": 1,
        "_source": {
          "IndexableContent": {
            "id": "5a9d1992b2067a8a52da23d2",
            "globalServiceId": "null",
            "contentType": "movie",
            "title": "Kathikeya",
            "description": "The story forms the rest of the story.",
            "language": "[tamil]",
            "timeAdded": "null",
            "views": "0",
            "releaseDate": "2014-09-15T00:00:00.000Z",
            "persons": "{Producer=[5a9d197bb2067a8a52da23d1], Actor=[5a9d197bb2067a8a52da23ce, 5a9d197bb2067a8a52da23cf], Director=[5a9d197bb2067a8a52da23d0]}",
            "personsFullNames": "[Swathi, Nikhil, Chandu M, VenkatSrinivas]",
            "genres": "[5a9d18b5b2067a8a52da23cd]",
            "genresFullNames": "[thriller]",
            "contentTags": "[]",
            "siblingOrder": "0.0",
            "summary": "A medical student, Karthikeya, visits the temple of Kumara Swami in Subramanyapuram to unveil the mystery behind its closure.",
            "searchcontent": "[]",
            "keyword": "null",
            "weight": "null"
          }
        }
      }
    ]
  }
}

Now I use following bool query
POST http://localhost:9200/content/_search

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "title": "ka"
          }
        }
      ]
    }
  }
}

And it returns no results. Wanna know what the query should be.

{"took":4,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Thank you,
Regards,
SatyaRaj


(David Pilato) #2

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(SatyaRaj) #3

As instructed, I did explain in detail. Please help.


(David Pilato) #4

Your title never indexed a term named ka. That's why it does not match.

May be use a edge n gram based analyzer if your goal is to do partial matching.


(SatyaRaj) #5

I did index another content with following detail.
And now tried same should query with 'ka' . Still no luck.
The should query still returns empty result.

{
  "took": 30,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "content",
        "_type": "content",
        "_id": "5a9d1992b2067a8a52da23d2",
        "_score": 1,
        "_source": {
          "IndexableContent": {
            "id": "5a9d1992b2067a8a52da23d2",
            "globalServiceId": "null",
            "contentType": "movie",
            "title": "Kathikeya",
            "description": "The story revolves around the rest of the story.",
            "language": "[telugu]",
            "timeAdded": "null",
            "views": "0",
            "releaseDate": "2014-09-15T00:00:00.000Z",
            "persons": "{Producer=[5a9d197bb2067a8a52da23d1], Actor=[5a9d197bb2067a8a52da23ce, 5a9d197bb2067a8a52da23cf], Director=[5a9d197bb2067a8a52da23d0]}",
            "personsFullNames": "[Swathi, Nikhil, Chandu M, VenkatSrinivas]",
            "genres": "[5a9d18b5b2067a8a52da23cd]",
            "genresFullNames": "[thriller]",
            "contentTags": "[]",
            "siblingOrder": "0.0",
            "summary": "A medical student, Karthikeya, visits the temple of Kumara Swami in Subramanyapuram to unveil the mystery behind its closure.",
            "searchcontent": "[]",
            "keyword": "null",
            "weight": "null"
          }
        }
      },
      {
        "_index": "content",
        "_type": "content",
        "_id": "5aa3b771ae68fe321e783a35",
        "_score": 1,
        "_source": {
          "IndexableContent": {
            "id": "5aa3b771ae68fe321e783a35",
            "globalServiceId": "null",
            "contentType": "movie",
            "title": "Kathikeya2",
            "description": "The story revolves around  the rest of the story.",
            "language": "[telugu]",
            "timeAdded": "null",
            "views": "0",
            "releaseDate": "2014-09-15T00:00:00.000Z",
            "persons": "{Producer=[5a9d197bb2067a8a52da23d1], Actor=[5a9d197bb2067a8a52da23ce, 5a9d197bb2067a8a52da23cf], Director=[5a9d197bb2067a8a52da23d0]}",
            "personsFullNames": "[Swathi, Nikhil, Chandu M, VenkatSrinivas]",
            "genres": "[5a9d18b5b2067a8a52da23cd]",
            "genresFullNames": "[thriller]",
            "contentTags": "[]",
            "siblingOrder": "0.0",
            "summary": "A medical student, Karthikeya, visits the temple of Kumara Swami in Subramanyapuram to unveil the mystery behind its closure.",
            "searchcontent": "[]",
            "keyword": "null",
            "weight": "null"
          }
        }
      }
    ]
  }
}

(David Pilato) #6

Your title never indexed a term named ka. That's why it does not match.
May be use a edge n gram based analyzer if your goal is to do partial matching.


(SatyaRaj) #7

First doubt while going through the docs. analyzer == tokenizer ??

I understood some issue with indexing.

I'm indexing the 'title' field in my json like this. Instead you want me to try with ngram. Will check on that and will post an update.

  "title": {
    "type": "text",
    "analyzer": "standard"
  }

The below URL is something I used to confirm indexing is successful.
So now I need to know how to check if title is indexed or not? Some help here.

http://localhost:9200/_cat/indices?v

> health status index   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
> yellow open   genre   JAayUs8oSbuxG5-cjdtirQ   5   1          4            0     17.3kb         17.3kb
> yellow open   content 6V-X6FiZSyOCeU24v6IIcQ   5   1          2            0     36.2kb         36.2kb

Thank you,
SR


(David Pilato) #8

Did you read the documentation about NGrams?

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.


(SatyaRaj) #9

is analyzer different from tokenizer.

 "title": {
    "type": "text",
    "analyzer": "standard"
  }

Is the below snippet correct?

 "title": {
    "type": "text",
    "analyzer": "ngram"
  }

(David Pilato) #10

Yes if you defined an analyzer named ngram when you created the index.

Read the doc for more information: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html


(SatyaRaj) #11

Now I created index with ngram and for my search 'standard'should suffice.
Considering I need number search and lower case words search is more matching.
Understood that analyzers are breaking my text and then matching.

Now coming to where I started.
I did index title with standard analyzer in json mapping.
How do I find if title is indexed, how to perform should on it.?

As I used standard analyzer and complete query should atleast match. But this also is returning empty. for the content I've posted in first question.

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "title": "Kathikeya"
          }
        }
      ]
    }
  }
}

Please help

Thank you,
Regards,
SR


(David Pilato) #12

As I already said:

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

But anyway, here try with kathikeya instead of Kathikeya.


(SatyaRaj) #13
  1. Found my content index json has issue. I tried with IndexableContent.title instead of title and the query did return result.

  2. Changed simple analyzer used in my code to standard.

My final query from age old code looks like this. Might need some more tweak, but atleast working for now.

{
  "query" : {
    "function_score" : {
      "query" : {
        "bool" : {
          "filter" : [
           
            {
              "terms" : {
                "serviceIds" : [
                  "myplex"
                ],
                "boost" : 1.0
              }
            }
            {
              "range" : {
                "expirationDate" : {
                  "from" : 1521012222443,
                  "to" : null,
                  "include_lower" : true,
                  "include_upper" : true,
                  "boost" : 1.0
                }
              }
            }
          ],
          "should" : [
            {
              "match_phrase_prefix" : {
                "title" : {
                  "query" : "jil",
                  "analyzer" : "standard",
                  "slop" : 0,
                  "max_expansions" : 50,
                  "boost" : 1.0
                }
              }
            },
            {
              "wildcard" : {
                "title" : {
                  "wildcard" : "jil*",
                  "boost" : 1.0
                }
              }
            },
            {
              "fuzzy" : {
                "title" : {
                  "value" : "jil",
                  "fuzziness" : "AUTO",
                  "prefix_length" : 0,
                  "max_expansions" : 100,
                  "transpositions" : false,
                  "boost" : 1.0
                }
              }
            },
            {
              "match_phrase_prefix" : {
                "personsFullNames" : {
                  "query" : "jil",
                  "analyzer" : "standard",
                  "slop" : 0,
                  "max_expansions" : 50,
                  "boost" : 1.0
                }
              }
            },
            {
              "fuzzy" : {
                "personsFullNames" : {
                  "value" : "jil",
                  "fuzziness" : "AUTO",
                  "prefix_length" : 0,
                  "max_expansions" : 100,
                  "transpositions" : false,
                  "boost" : 1.0
                }
              }
            },
            {
              "query_string" : {
                "query" : "*",
                "default_field" : "_all",
                "fields" : [ ],
                "use_dis_max" : true,
                "tie_breaker" : 0.0,
                "default_operator" : "or",
                "analyzer" : "standard",
                "auto_generate_phrase_queries" : false,
                "max_determinized_states" : 10000,
                "enable_position_increments" : true,
                "fuzziness" : "AUTO",
                "fuzzy_prefix_length" : 0,
                "fuzzy_max_expansions" : 50,
                "phrase_slop" : 0,
                "lenient" : true,
                "escape" : false,
                "split_on_whitespace" : true,
                "boost" : 1.0
              }
            }
          ],
          "disable_coord" : false,
          "adjust_pure_negative" : true,
          "boost" : 1.0
        }
      },
      "functions" : [
        {
          "filter" : {
            "match_all" : {
              "boost" : 1.0
            }
          },
          "gauss" : {
            "releaseYear" : {
              "origin" : 2018,
              "scale" : "30d",
              "offset" : 0,
              "decay" : 0.5
            },
            "multi_value_mode" : "MIN"
          }
        }
      ],
      "score_mode" : "max",
      "boost_mode" : "multiply",
      "max_boost" : 3.4028235E38,
      "boost" : 1.0
    }
  }
}

Got around this.

Thanks for all help.


(system) closed #14

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.