Pattern replace character filter and ingest attachment


(Xabi) #1

Hi,

I'm using the ingest attachment plugin to extract data from pdf. I want to use pattern_replace character filter to change some characters from the content of pdf, but I can't get it work well.

1.- Create a pipeline

PUT _ingest/pipeline/atxikiak
{
  "description" : "PDFtako textuak atera",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "properties": [ "CONTENT", "TITLE", "AUTHOR", "KEYWORDS", "CONTENT_TYPE","LANGUAGE", "DATE", "content_length" ],
	      "indexed_chars": -1
      }
    }
  ]
}

2.- Create my index

PUT artxiboa
{
  "settings": {
    "analysis": {
      "analyzer": {
        "gara_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "gara_char_filter"
          ]
        }
      },
      "char_filter": {
        "gara_char_filter": {
          "type": "pattern_replace",
          "pattern": "([a-zA-Z])-([a-zA-Z])",
          "replacement": "$1$2"
        }
      }
    }
  },
  "mappings": {
    "pdf": {
      "properties": {
        "sekzioa":     { "type": "text" },
        "data_osoa":   { "type": "date", "format": "yyyy-MM-dd" },
        "attachment.content" : {
            "type" : "text",
            "analyzer" : "gara_analyzer",
            "store" : true
        }
       }
     }
  }
}

With my char_filter It is assumed that some-other converted in someother. But when search someother don`t find anything.

Any help please?

Thanks,


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.