Reading BLOB data Docx from MySQL table

I am trying to read a BLOB data stored in a Mysql table using JDBC connector in Logstash. I intend to index the data in Elasticsearch. I have been able to read a pdf file however when i read a Docx file, i get an error. I am using the following Ruby Code in the FILTER of Logstash:

file_to_read= StringIO.new(event.get('resume'))
#       THIS STATEMENT GIVE AN ERROR STRING CONTAINS A NULL BYTE            
            doc= Docx::Document.open (file_to_read)
                data=[]
#       Retrieve and display paragraphs
                doc.paragraphs.each do |p|
#       SINCE THE PARAGRAPHS ARE IN AN ARRAY, THE OUTPUT IS SENT TO AN ARRAY        
                    data<<p.text
#                   event.set('doc_content', data.to_s)
            end

If i write the binary string output from (event.get('resume')) to a file and then read it using doc= Docx::Document.open('FILE_TO_READ') then there is no error.

MY QUESTION 1: Is it possible to directly read the binary string into Docx GEM as the documentation says it can read a Buffer also?
QUESTION 2: If i write the files stored as BLOB to disk and then start reading them, i am not able to read all the files using this code and only the first Docx file stored in id=1 is picked and duplicated in id=2.

jdbc {
   
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/mysqljdbc"
    jdbc_user => "root"
   jdbc_password => "root"
    
    schedule => "*/1 * * * *"
    
	statement => "SELECT * FROM mysqljdbc.candidates WHERE id IN(1, 2, )"
	}
filter{
ruby{

 
	 code => "
			require 'pdf-reader'
			require 'json'
	#		added to use io object
			require 'stringio'
			require 'docx'
format=event.get('type')
			
	#		Convert BINARY STRINGS to IO object	
	#		StringIO allows strings to behave like IOs. This is useful when we want to pass strings into systems that consume streams. This is common in tests where we might inject a StringIO instead of reading an actual file from disk.
	
				if format!= nil
				file_to_read= StringIO.new(event.get('FIELD'))
				end
                               File.binwrite('new.docx',event.get('FIELD'))
				doc= Docx::Document.open('PATH/new.docx')
				data=[]
	# 		Retrieve and display paragraphs
						doc.paragraphs.each do |p|
	#		SINCE THE PARAGRAPHS ARE IN AN ARRAY, THE OUTPUT IS SENT TO AN ARRAY 		
						data<<p.text
	#					event.set('doc_content', data.to_s)
					end
	#-----------FOR OPENING TABLE CONTENT
			if(doc.tables[0]!=nil) then
	# 		Iterate through tables
				doc.tables.each do |table|
					table.rows.each do |row| # Row-based iteration
						row.cells.each do |cell|
							data<< cell.text
						end
					end
				end	
			else
			end
			event.set('doc_content', data.to_s)
"
	}
	}

Grateful for some pointers to the right way. Thanks in advance

Hi, i just hope that the question was not too stupid to beget a response. Even if it is, i am a novice, trying to find a way to index the files from SQL.
Can anyone pl guide me... Especially the Experts!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.