Read text from pdf file

String filePath = "C://x.pdf";
	String encodedfile = null;
	RestHighLevelClient restHighLevelClient = null;
	File file = new File(filePath);
	try
	{
		FileInputStream fileInputStreamReader = new FileInputStream(file);
		byte[] bytes = new byte[(int) file.length()];
		fileInputStreamReader.read(bytes);
		encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
	}
	catch (IOException e)
	{
	}
	try
	{
		restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http")));
	}
	catch (Exception e)
	{
	}

	Map<String, Object> jsonMap = new HashMap<>();
	jsonMap.put("Name", "samanvi");
	jsonMap.put("postDate", new Date());
	jsonMap.put("hra", encodedfile);
	IndexRequest request = new IndexRequest("index", "_doc", "56")
			.index("index")
			.source("field", jsonMap);
	try
	{
		IndexResponse response = restHighLevelClient.index(request, RequestOptions.DEFAULT);
	}
	catch (ElasticsearchException | IOException e)
	{
	}

'I am indexing file this way, now I want to read text from this pdf file, can anyone please describe how could i read text by elasticsearch query.'

This is not going to work unless you use the ingest attachment plugin to extract the text from your file.

See: Ingest Attachment Processor Plugin | Elasticsearch Plugins and Integrations [7.11] | Elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.