Pattern replace character filter and ingest attachment


I'm using the ingest attachment plugin to extract data from pdf. I want to use pattern_replace character filter to change some characters from the content of pdf, but I can't get it work well.

1.- Create a pipeline

PUT _ingest/pipeline/atxikiak
  "description" : "PDFtako textuak atera",
  "processors" : [
      "attachment" : {
        "field" : "data",
        "properties": [ "CONTENT", "TITLE", "AUTHOR", "KEYWORDS", "CONTENT_TYPE","LANGUAGE", "DATE", "content_length" ],
	      "indexed_chars": -1

2.- Create my index

PUT artxiboa
  "settings": {
    "analysis": {
      "analyzer": {
        "gara_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
      "char_filter": {
        "gara_char_filter": {
          "type": "pattern_replace",
          "pattern": "([a-zA-Z])-([a-zA-Z])",
          "replacement": "$1$2"
  "mappings": {
    "pdf": {
      "properties": {
        "sekzioa":     { "type": "text" },
        "data_osoa":   { "type": "date", "format": "yyyy-MM-dd" },
        "attachment.content" : {
            "type" : "text",
            "analyzer" : "gara_analyzer",
            "store" : true

With my char_filter It is assumed that some-other converted in someother. But when search someother don`t find anything.

Any help please?


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.