Hi,
I'm using the ingest attachment plugin to extract data from pdf. I want to use pattern_replace
character filter to change some characters from the content of pdf, but I can't get it work well.
1.- Create a pipeline
PUT _ingest/pipeline/atxikiak
{
"description" : "PDFtako textuak atera",
"processors" : [
{
"attachment" : {
"field" : "data",
"properties": [ "CONTENT", "TITLE", "AUTHOR", "KEYWORDS", "CONTENT_TYPE","LANGUAGE", "DATE", "content_length" ],
"indexed_chars": -1
}
}
]
}
2.- Create my index
PUT artxiboa
{
"settings": {
"analysis": {
"analyzer": {
"gara_analyzer": {
"tokenizer": "standard",
"char_filter": [
"gara_char_filter"
]
}
},
"char_filter": {
"gara_char_filter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z])-([a-zA-Z])",
"replacement": "$1$2"
}
}
}
},
"mappings": {
"pdf": {
"properties": {
"sekzioa": { "type": "text" },
"data_osoa": { "type": "date", "format": "yyyy-MM-dd" },
"attachment.content" : {
"type" : "text",
"analyzer" : "gara_analyzer",
"store" : true
}
}
}
}
}
With my char_filter It is assumed that some-other converted in someother. But when search someother don`t find anything.
Any help please?
Thanks,