Percolator throws INTERNAL_SERVER_ERROR on irrelevant field

javierco · July 12, 2016, 2:51pm

Hello, I just upgraded from version 1.4.4 to 2.3.3 and I'm having an issue with the percolator:

Before, we used to send the percolator requests using the _mpercolate endpoint for an existing document as follows:

curl -XGET 'localhost:9200/_mpercolate' -d '
{"percolate":{"id":"myid_123456","index":"week_29","percolate_index":"my_percolator_index","type":"article"}}
{}
'

Where the article with ID "myid_123456" was indexed on index "week_29" and has many attributes as part of the document. The percolator index, "my_percolator_index" only has a query to match two of the fields in the article document:

{
  "_index": "my_percolator_index",
  "_type": ".percolator",
  "_id": "897987987",
  "_version": 1,
  "_score": 1,
  "_source": {
    "query": {
      "bool": {
        "must": {
          "query_string": {
            "query": "+(title:Obama body:Obama) +(title:Clinton body:Clinton)"
          }
        }
      }
    },
    "unique_query_name": "candidates",
    "unique_query_id": 512508281
  }
}

The response in 1.4.4 was successful, returning the ID of the percolator query that match the document if title or body has a match. However, in 2.3.3 the response throws an error for any other field in the document that does not have a analyzer in the mapping, which for us is not needed:

{
  "responses": [
    {
      "took": 9,
      "_shards": {
        "total": 2,
        "successful": 0,
        "failed": 2,
        "failures": [
          {
            "shard": 0,
            "index": "my_percolator_index",
            "status": "INTERNAL_SERVER_ERROR",
            "reason": {
              "type": "exception",
              "reason": "Failed to create token stream for [author]",
              "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "Field [author] has no associated analyzer"
              }
            }
          }
        ]
      },
      "total": 0,
      "matches": [
        
      ]
    }
  ]
}

Can I get some help on solving this issue without having to add every field to the mapping template?

Thanks a lot!

Javier C.

javierco · July 12, 2016, 5:52pm

UPDATE:

I changed the way we index the percolator query to specifically set the fields we want the query to work on:

{
  "_source": {
    "query": {
      "bool": {
        "must": {
          "query_string": {
            "query": "Obama Clinton",
            "fields": [
              "title",
              "body"
            ]
          }
        }
      }
    }
  }
}

And I still get the same error

Thanks!

mvg · July 13, 2016, 5:36am

For some reason the author field doesn't have an analyzer. Can your share the article mapping?

javierco · July 13, 2016, 12:25pm

Hello Martijn, bellow is the information for the template, week_29 index mapping (where the document is stored), my_percolator_index mapping (where the .percolator query is stored) and a sample document:

Custom template:

{
  "custom_template": {
    "template": "*",
    "settings": {
      "index": {
        "analysis": {
          "filter": {
            "length_filter": {
              "type": "length",
              "min": "2"
            }
          },
          "analyzer": {
            "custom_analyzer": {
              "type": "custom",
              "filter": [
                "lowercase",
                "stop",
                "length_filter"
              ],
              "tokenizer": "whitespace"
            }
          }
        }
      }
    },
    "mappings": {
      "article": {
        "properties": {
          "body": {
            "analyzer": "custom_analyzer",
            "type": "string"
          },
          "title": {
            "analyzer": "custom_analyzer",
            "type": "string"
          }
        }
      }
    }
  }
}

week_29 index mapping:

{
  "mappings": {
    "article": {
      "properties": {
        "body": {
          "analyzer": "custom_analyzer",
          "type": "string"
        },
        "title": {
          "analyzer": "custom_analyzer",
          "type": "string"
        },
        "author": {
          "type": "string"
        }
      }
    }
  }
}

my_percolator_index mapping:

{
  "mappings": {
    "article": {
      "properties": {
        "body": {
          "analyzer": "custom_analyzer",
          "type": "string"
        },
        "title": {
          "analyzer": "custom_analyzer",
          "type": "string"
        }
      }
    },
    ".percolator": {
      "properties": {
        "query": {
          "enabled": false,
          "type": "object"
        },
        "unique_query_name": {
          "type": "string"
        },
        "unique_query_id": {
          "type": "long"
        }
      }
    }
  }
}

sample document:

{
  "_index": "week_29",
  "_type": "article",
  "_id": "myid_123456",
  "_version": 1,
  "_score": 1,
  "_source": {
    "title": "the title of the article that includes Obama",
    "body": "the body of the article that includes Clinton",
    "author": "Javier"
  }
}

note: this is similar to what I have in es 1.4.4 version (which works) where the mapping for 'author' is in the index where the document is stored but not in the percolator index.

Let me know if you need any additional information, and thanks a lot for the help!

Javier C.

mvg · July 13, 2016, 12:46pm

I think you're getting this error because the custom_analyzer analyzer isn't configured in the my_percolator_index index. Can you add the analyzer related index settings from index week29 to my_percolator_index index? (you do need to close and open the index for these settings to take in effect)

javierco · July 13, 2016, 1:25pm

Sorry for the confusion, the my_percolator_index index does have the analyzer, I just didn't copy the whole Index Metadata information, but here it is:

{
  "state": "open",
  "settings": {
    "index": {
      "creation_date": "1468364676444",
      "uuid": "tFMhPX5XRri-MVc8iEZVeA",
      "analysis": {
        "filter": {
          "length_filter": {
            "type": "length",
            "min": "2"
          }
        },
        "analyzer": {
          "custom_analyzer": {
            "type": "custom",
            "filter": [
              "lowercase",
              "stop",
              "length_filter"
            ],
            "tokenizer": "whitespace"
          }
        }
      },
      "number_of_replicas": "0",
      "number_of_shards": "2",
      "version": {
        "created": "2030399"
      }
    }
  },
  "mappings": {
    "article": {
      "properties": {
        "body": {
          "analyzer": "custom_analyzer",
          "type": "string"
        },
        "title": {
          "analyzer": "custom_analyzer",
          "type": "string"
        }
      }
    },
    ".percolator": {
      "properties": {
        "query": {
          "enabled": false,
          "type": "object"
        },
        "unique_query_name": {
          "type": "string"
        },
        "unique_query_id": {
          "type": "long"
        }
      }
    }
  },
  "aliases": [
    
  ]
}

Thanks again!

javierco · July 15, 2016, 6:09pm

Hi Martijn, do you have any other suggestions?

I still get the error even when I index a new document to my week_29 index and it contains a new fields and I try to run the percolator, it fails because of the missing analyzer error.

Thanks for the help!

Javier

mvg · July 15, 2016, 7:50pm

Hi Javier,

I've tried to reproduce your error here with the snippets you provided in this question.
However Im unable to reproduce it. Can you share a reproduction (minimal steps to reproduce the error)?

This is what I've tried:

PUT /test
{
  "settings": {
    "index": {
      "creation_date": "1468364676444",
      "uuid": "tFMhPX5XRri-MVc8iEZVeA",
      "analysis": {
        "filter": {
          "length_filter": {
            "type": "length",
            "min": "2"
          }
        },
        "analyzer": {
          "custom_analyzer": {
            "type": "custom",
            "filter": [
              "lowercase",
              "stop",
              "length_filter"
            ],
            "tokenizer": "whitespace"
          }
        }
      },
      "number_of_replicas": "0",
      "number_of_shards": "2",
      "version": {
        "created": "2030399"
      }
    }
  },
  "mappings": {
    "article": {
      "properties": {
        "body": {
          "analyzer": "custom_analyzer",
          "type": "string"
        },
        "title": {
          "analyzer": "custom_analyzer",
          "type": "string"
        }
      }
    },
    ".percolator": {
      "properties": {
        "query": {
          "enabled": false,
          "type": "object"
        },
        "unique_query_name": {
          "type": "string"
        },
        "unique_query_id": {
          "type": "long"
        }
      }
    }
  }
}

PUT /test/.percolator/1
{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "+(title:Obama body:Obama) +(title:Clinton body:Clinton)"
        }
      }
    }
  }
}

GET test/article/_percolate
{
  "doc": {
    "title": "the title of the article that includes Obama",
    "body": "the body of the article that includes Clinton",
    "author": "Javier"
  }
}

But that ran successful here.

Martijn

javierco · July 18, 2016, 1:37pm

Hi Martijn, thanks for the help.

I noticed that you used the same index (test) for the document and the percolator. However, in my case, I have separate indices. One where I store the document (week_29), and a separate one where I store the percolator query (my_percolator_index). And that's how I get the error. I will post reproducible steps, let me work them out.

Javier.

javierco · July 18, 2016, 2:13pm

These are the steps to reproduce the problem:

curl -XPUT 'http://localhost:9200/week_29/' -d '{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "length_filter": {
            "type": "length",
            "min": "2"
          }
        },
        "analyzer": {
          "custom_analyzer": {
            "type": "custom",
            "filter": [
              "lowercase",
              "stop",
              "length_filter"
            ],
            "tokenizer": "whitespace"
          }
        }
      },
      "number_of_replicas": "0",
      "number_of_shards": "2",
      "version": {
        "created": "2030399"
      }
    }
  },
  "mappings": {
    "article": {
      "properties": {
        "body": {
          "analyzer": "custom_analyzer",
          "type": "string"
        },
        "title": {
          "analyzer": "custom_analyzer",
          "type": "string"
        }
      }
    },
    ".percolator": {
      "properties": {
        "query": {
          "enabled": false,
          "type": "object"
        },
        "unique_query_name": {
          "type": "string"
        },
        "unique_query_id": {
          "type": "long"
        }
      }
    }
  }
}'

curl -XPUT 'http://localhost:9200/my_percolator_index/' -d '{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "length_filter": {
            "type": "length",
            "min": "2"
          }
        },
        "analyzer": {
          "custom_analyzer": {
            "type": "custom",
            "filter": [
              "lowercase",
              "stop",
              "length_filter"
            ],
            "tokenizer": "whitespace"
          }
        }
      },
      "number_of_replicas": "0",
      "number_of_shards": "2",
      "version": {
        "created": "2030399"
      }
    }
  },
  "mappings": {
    "article": {
      "properties": {
        "body": {
          "analyzer": "custom_analyzer",
          "type": "string"
        },
        "title": {
          "analyzer": "custom_analyzer",
          "type": "string"
        }
      }
    },
    ".percolator": {
      "properties": {
        "query": {
          "enabled": false,
          "type": "object"
        },
        "unique_query_name": {
          "type": "string"
        },
        "unique_query_id": {
          "type": "long"
        }
      }
    }
  }
}'

curl -XPUT 'http://localhost:9200/my_percolator_index/.percolator/1' -d '{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "+(title:Obama body:Obama) +(title:Clinton body:Clinton)"
        }
      }
    }
  }
}'

curl -XPUT 'http://localhost:9200/week_29/article/99' -d '{
  "title": "the title of the article that includes Obama",
  "body": "the body of the article that includes Clinton",
  "author": "Javier"
}'

curl -XGET 'localhost:9200/_mpercolate' -d '
{"percolate":{"id":"99","index":"week_29","type":"article","percolate_index":"my_percolator_index"}}
{}
'

And I get the error:

{
  "responses": [
    {
      "took": 14,
      "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 1,
        "failures": [
          {
            "shard": 1,
            "index": "my_percolator_index",
            "status": "INTERNAL_SERVER_ERROR",
            "reason": {
              "type": "exception",
              "reason": "Failed to create token stream for [author]",
              "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "Field [author] has no associated analyzer"
              }
            }
          }
        ]
      },
      "total": 0,
      "matches": [
        
      ]
    }
  ]
}

Thanks!

Javier.

mvg · July 20, 2016, 8:14am

Apologies for the late response.

The reason for this error is that the author field is not mapped in the percolator index. Adding the field to the percolate index should fix this.

Unfortunately there is a difference between how percolating an existing document and percolating a provided document handles this situation. When percolating a provided document and field is missing then the percolate api accepts this and even updates the mapping. Whereas when percolating an existing document this is not the case and as you see fails if an unknown field is encountered.

In ES 5.0, the percolator has been rewritten and the differences between two ways of percolating have disappeared and your example (with minor changes to the mapping (making use of the percolator field mapping)) works correctly.

javierco · July 20, 2016, 12:36pm

Thanks for the support Matijn, I understand.

It is unfortunate that this feature that used to work on ES 1.4.4, was removed on ES 2.3.3, to now being put again on ES 5.0.

We just upgraded to v2.3.3 from v.1.4.4 and now have to write every single field mapping into the percolator index manually, even when the document index adds the mapping automatically.

Again, I appreciate your help in finding the issue. Take care!!!

Javier C.

Topic		Replies	Views
Using percolator causes INTERNAL_SERVER_ERROR with NPE Elasticsearch	5	932	September 4, 2017
Percolator issue? Elasticsearch	3	265	July 6, 2017
Cannot match percolator on multiple fields (ES 5.5) Elasticsearch	3	678	October 2, 2017
Error Creating Percolator Elasticsearch	9	335	July 6, 2017
Percolator Issue Elasticsearch	5	438	July 6, 2017

Percolator throws INTERNAL_SERVER_ERROR on irrelevant field

Related topics