Problem with search wrong result


(Rokade Akshay Raju) #1

I have mapping like :

 "382:FormattedValue": {
            "type": "string",
            "analyzer": "ExportRawAnalyzer",
            "ignore_above": 10922
          },
          "382:Value": {
            "type": "string",
            "fields": {
              "date": {
                "type": "date",
                "ignore_malformed": true,
                "format": "date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd HH:mm||yyyy/MM/dd||dd/MM/yyyy HH:mm:ss||dd/MM/yyyy HH:mm||dd/MM/yyyy||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm||yyyy-MM-dd||dd-MM-yyyy HH:mm:ss||dd-MM-yyyy HH:mm||dd-MM-yyyy||yyyy.MM.dd HH:mm:ss||yyyy.MM.dd HH:mm||yyyy.MM.dd||dd.MM.yyyy HH:mm:ss||dd.MM.yyyy HH:mm||dd.MM.yyyy||MM/dd/yy"
              }

I am hitting query like:

GET index_name/item/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "must_not": [
              {
                "exists": {
                  "field": "382:Value"
                }
              }
            ]
          }
        }
      ]
    }
  },
  "filter": {
    "bool": {
      "must": [
        {
          "match": {
            "LanguageID": "1"
          }
        },
        {
          "terms": {
            "_Parents": [
              "13512"
            ]
          }
        }
      ]
    }
  },
  "_source": "382:Value",
  "size": 200
}

In case of if I have huge value (might be more that 10922) for field 382 for any item, I am getting that item in result with above query.

When I tried with reducing amount of content in field then query working fine.

Don't know reason why with large content in field it's behaving wrongly.

I am using elasticsearch version 2.4.0

Please guide me get it solve.

Thank you.


(Christoph) #2

See https://www.elastic.co/guide/en/elasticsearch/reference/2.4/ignore-above.html

Strings longer than the ignore_above setting will not be processed by the analyzer and will not be indexed.


(Rokade Akshay Raju) #3

But if you have look for mapping I don't have "ignore_above" for my "382:value"
Still it's not working what is the cause for it?


(Christoph) #4

I formatted your example a bit to make it more readable (adding ``` before and after code snippets does that, for future reference). Can you share the full mapping and an example document that gets / does not get returned with your query?


(Rokade Akshay Raju) #5

Hi I have index setting as follow:

 "settings": {
      "index": {
        "creation_date": "1527835314570",
        "analysis": {
          "analyzer": {
            "ExportPrimaryAnalyzer": {
              "filter": "lowercase",
              "char_filter": "html_strip",
              "type": "custom",
              "tokenizer": "whitespace"
            },
            "ExportRawAnalyzer": {
              "filter": "lowercase",
              "char_filter": "html_strip",
              "type": "custom",
              "tokenizer": "keyword"
            }
          }
        },
        "number_of_shards": "5",
        "number_of_replicas": "0",
        "uuid": "dbgwZJtTRW6imK2AoZ1uIg",
        "version": {
          "created": "2040099"
        }
      }
    }

(Rokade Akshay Raju) #6

sample document with which I am facing problem is as follow:

 {
  "_index": "export_cslive_mamfile",
  "_type": "item",
  "_id": "13515_1",
  "_version": 1,
  "found": true,

(Christoph) #11

Hi, I reformated your mappings using "```" again, please use that it makes it easier for everybody to read this. Also I cannot read your document the way you posted it across many comments. I don't know about any limitations about comment size here in the forum, but please either simplify your document or post it somewhere else like https://pastebin.com/ or gist and share the link to make it possible to work with this.
Thanks.


(Rokade Akshay Raju) #12

ok limit size for comment is only 7000 hence I unable to place whole document at once.


(Rokade Akshay Raju) #13

Here is link for sample document with which I have problemdoc_link


(Christoph) #14

Thanks, still missing the full mapping, your example from the first comment is truncated, expecially the "382:Value" field mapping. But better post all.


(Rokade Akshay Raju) #15

Ok , here you can find full mapping:
mapping_link


(Christoph) #16

From your mapping file:

          "382:Value": {
            "type": "string",
            "fields": {
              [...]},
            "analyzer": "ExportPrimaryAnalyzer",
            "ignore_above": 10922
          },

The fields source in the document you linked is about 18748 characters, that explains why its dropped and appears in the bool-must-not-exists query.


(Rokade Akshay Raju) #17

so basically we need to remove "ignore_above" or we need to increase size of it right?


(Rokade Akshay Raju) #18

Is it possible to set ignore_above at setting or configuration level, If I remove ignore_above from mapping then I am getting exception.

Can you please guide how can we handle such kind of situation.


(Christoph) #19

No, as far as I can tell only in the keyword mappings.

This is unexpected, the default value should be Integer.MAX_VALUE which is 2147483647, but there is also a Lucene limit of 32766 characters per term which for good reasons cannot be increased. If you have longer fields, consider your general approach.


(Christoph) #20

Also please see the note on the lucene limit on https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html for further information.


(system) #21

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.