Logstash file input ,message columns reads one line at atime


#1

I am reading one.pdf file ,it reads it multiple events ,but i want to read whole file in one event .is it possible.
Also my file content does not appear plain text .
it has test as below.how can i get .pdf text as is it.
x\x9CURkLSg\u0018\xFENK\x8FGס\x835K\xC7\xEC9YȢ\u0013'\xE0\u0002\x82D\x86\xB0\xA68n2)\xE32\xA0\x85\xD2VN\


(Magnus Bäck) #2

Logstash doesn't support extraction of text from PDFs. Perhaps you can use a separate program to process the PDFs and produce text files that Logstash can consume? That said, Logstash's file input doesn't support reading a whole file and produce a single event of it so you probably need to figure something out.


#3

we had contents column in FS river where whole content used to come,so we can not have similar option with Logstash?
My requirement is to read multiple file types from fileserver like csv,pdf,txt and .xml.Cant we do anything with charset setting to read any kind of file type content?or any other solution?


(Mark Walkom) #4

Most people use Apache Tikka to extract from PDF/DOC etc.

LS can read XML, CSV and TXT natively.


#5

Thanks warkolm,i used tika and it works as i needed.
but i need to supply binary base 64 data of file manually.
is there any way where i can index whole file directory with multiple files, by mentioning path .
As fs river is depreciated from 1.5 or higher i don't want to use it.
Can we still do anything with logstash or just tika without river.

I could see some examples where we need to do some coding and we can achieve it ,still looking if its possible without that.


(system) #6