Remove the number from the field, and only remain letter

remove the number from the field, only I want letter . in my PDF have table contain name and id, id is number when I index and using ingest attachment pipeline number will store as well like this

id     name
123    user one
132    user two

what i want only to store name (user one and user two)

You have an _id field which is required. If you are talking about another field called id that you want removed then you can use the remove processor in your ingest pipeline.

{
  "remove": {
    "field": "id"
  }
}
1 Like

no this name and id in my pdf,i am send this pdf as base64, and in attachment.content this number will store as well, i want only to store character in attachment.content not this number

mean in attachment.content i have all this ( id name 123 user one 132 user two) i want only character like this in this field ( id name user one user two)

Can you show me the expected result using JSON? I am still not sure. Here is what the input looks like.

Input

{
 "id": 123,
 "name": "user one"
},
{
 "id": 132,
 "name": "user two"
}


this is is my table in pdf and I want to index the content(text) of the pdf. I have installed ingest attachment processor, it will convert base64 to text that is searchable, it will store to attachment.content like this ( name id user one 123 user two 132) as a text, what I want only I want a text for this field (attachment. content). I need something for this field like a simple analyzer the only allow letter,
when i want to search for it, this given

"_index" : "myIndex",
        "_type" : "_doc",
        "_id" : "vtvp-XsBs542oOi7VWgr",
        "_score" : 1.0,
        "_source" : {
          "attachment" : {
            "content" : """
            name id user one 123 user two 132
            """
                            },
        }
      }

content is name id user one 123 user two 132 , but i want to remove this number

I think I know what you mean now. There could be an easier way but this is the first solution I thought of.

  1. Remove all numbers
  2. Clean up the double spaces it causes
  3. Trim up spaces if numbers were at the front or end
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "describe pipeline",
    "processors": [
      {
        "gsub": {
          "field": "message",
          "pattern": "[0-9]",
          "replacement": ""
        }
      },
      {
        "gsub": {
          "field": "message",
          "pattern": "  ",
          "replacement": " "
        }
      },
      {
        "trim": {
          "field": "message"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "name id user one 123 user two 132"
      }
    }
  ]
}

Output

"message" : "name id user one user two"
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.