Parsing word and pdf file in elasticsearch using logstash

I m trying to read pdf file using logstash
i use following config file

input {
file {
path => "D:\ELKstack\elastic\page_ELK.pdf"
start_position => "beginning"
}
}

filter {
if [type] == ".pdf"{
multiline {
pattern => "^\n"
what => "next"
negate=>false
}
}
}
output {
elasticsearch{ hosts => ["localhost:9000"] index => "logstash-pdf" }
stdout {codec => "rubydebug" }
}

output is-

ԿD\xA6r\xBD_\xFF,Gh\xFE}ؙ\xB8g+[\u0006ۿ\xF2\xC5gp^\xE8\xDA\xE7Ym\u0083\xA7\u0013\xF8'X\xB5\x9E.\xBD\xC5\xFB\xFCR\x97\xBA\x8Fۘ\xB1\x80QH%wE1\u001F\xB5\x9B\x9B\xA2Et\x8B\u001Fd\u0005\x9AC\xA4\xDDKdq@\x96\xC4\xF1\u001AY\xA7L05\xBA\xF0Ӗץ\x99\xEAy\x8E\u0011\xA0\xDD\xDEٽAEӀ\x8A\u0011\"_\x9C\xC4\u001D\xC9avA%\xA3\xB4#\xD7\xE9؋\x838\x96\xC8'\xB8bL\xED\xB2l;p\xF6\x86\f\t\xB9\u0001O\u0011NGf\xE8tYW\xD6\xF9\xD6t\xADRӐl\xE5e\xD5xL\xA9\xC6y\xE1\u05F9\u0015\xE6_\xE1j/х,\u0001\x8D\u007F\xB2Fw>\u000Fa\xC6\u0019|\xF6\xAE\xC9|\xE3\xCC\u0012Q\u00152\xAD~"

it is not readable.what should i do?

Logstash cannot parse these sorts of files.

See https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html

thanks @warkolm

can i do it using kibana?

Nope.

ok... thanks

i m trying to load pdf in elasticsearch using ingest-attachment
i tried this-

PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}

PUT my_index/my_type/my_id?pipeline=attachment
{
"data": "JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFu=\n"
}

output-

{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: IllegalArgumentException[Input byte array has wrong 4-byte ending unit];",
"header": {
"processor_type": "attachment"
}
}
]
can you please tell me what is wrong in this?

how can i use apache tika with elasticsearch???

Hii @warkolm
I m trying to load pdf in elasticsearch using ingest-attachment plugin

PUT my_index/my_type/my_id?pipeline=attachment
{
"data": "JVBERi0xLjUNCiW1tbW1DQoxIDAgb2JqDQo8PC9UeXBlL0NhdGFsb2cvUGFnZXMgMiAwIFIvTGFu="
}

output-
{
"error": {
"root_cause": [
{
"type": "illegal_state_exception",
"reason": "There are no ingest nodes in this cluster, unable to forward request to an ingest node."
}
],
"type": "illegal_state_exception",
"reason": "There are no ingest nodes in this cluster, unable to forward request to an ingest node."
},
"status": 500
}

can you please tell me the correction?

You need to make sure you have a node that can run the ingest pipelines.

@warkolm
I checked in my elasticsearch.yml file. There is no ingest node.
Can I add it manually?

can u please tell me,
where do i set ingest node in elasticsearch.yml???

Have you read the documentation?
We are happy to help, but unfortunately we don't have the time to walk you through every step.

yes i read the documentation ,but i didn't get where to set that ingest.node in elasticsearch.yml.
I installed the ingest attachment plugin.
Thanks @warkolm I will try to resolve it.If any query there then i will get back to you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.