How to index and store pdf file in elastic search using spring boot?

I am new to elastic search. I would like to know how to index and store the pdf files in elastic search using spring boot microservices

You can use the ingest attachment plugin to parse your PDF documents at index time and extract the meaningful information.

You will need for that to send the binary file as a BASE64 String.

Hie David,
I tried using ingest plugin

-------This is the function which is called in postman inorder to upload pdf file in es---------

@PostMapping("/upload")
 public String upload()
 {
	 
	String filePath = "C://x.pdf";
	String encodedfile = null;
	RestHighLevelClient restHighLevelClient = null;
	File file = new File(filePath);
	try {
	    FileInputStream fileInputStreamReader = new FileInputStream(file);
	    byte[] bytes = new byte[(int) file.length()];
	    fileInputStreamReader.read(bytes);
	    encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
	} catch (IOException e) {
	    e.printStackTrace();
	}
	try {
	    if (restHighLevelClient != null) {
	        restHighLevelClient.close();
	    }
	} catch (final Exception e) {
	    System.out.println("Error closing ElasticSearch client: ");
	}
	try {
	    restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
	            new HttpHost("localhost", 9201, "http")));
	} catch (Exception e) {
	    System.out.println(e.getMessage());
	}
	
	
	Map<String, Object> jsonMap = new HashMap<>();
	jsonMap.put("Name", "samanvi");
	jsonMap.put("postDate", new Date());
	jsonMap.put("hra", encodedfile);
 

	IndexRequest request = new IndexRequest("index","pdf","56")
			.index("index")
			.source("field",jsonMap)
	        .setPipeline("samanvi");
	System.out.println("pipeline"+request.getPipeline());
	System.out.println("index"+request.index());
	try {
	    IndexResponse response = restHighLevelClient.index(request,RequestOptions.DEFAULT);
	} catch(ElasticsearchException | IOException e) {
	    if (((ElasticsearchException) e).status() == RestStatus.CONFLICT) {
	    }	
 }
	return "uploaded";
 }

As per the above code the pipeline should get created with index ="index" but It isn't created

but then i executed the below one

PUT _ingest/pipeline/samanvi
{
{"description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "hra"
      }
    }
  ]
}
}

and the got the acknowledgement true and pipeline got created
Is there anything wrong with my code?

And when I am executing the get call
http://localhost:9200/index/pdf/56?pipeline=samanvi

I am unable to fetch any details
and I am getting the following error

{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "request [/index/pdf/56] contains unrecognized parameter: [pipeline]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "request [/index/pdf/56] contains unrecognized parameter: [pipeline]"
    },
    "status": 400
}

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

It does not work with GET.

BTW you should use _doc as the type instead of pdf.

I used _doc in my upload() function.
How to check whether the pipeline is created and file is uploaded ?

GET call
http://localhost:9200/_ingest/pipeline/samanvi
returns that pipeline with id [samanvi] does not exist

okay

Hi,
Can you please look into to my above problem.
The pipleine is not getting created .
How to make sure that pdf is saved in Elastic search

I have no idea what the current status of your problem is.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Once we will have a fully functional script, we will be able to move on the Java code.

My task is to store the pdf files in elastic search using ingest plugin and spring boot.

Below mentioned function is to store the pdf files in ES

@PostMapping("/upload")
 public String upload()
 {
	 
	String filePath = "C://x.pdf";
	String encodedfile = null;
	RestHighLevelClient restHighLevelClient = null;
	File file = new File(filePath);
	try {
	    FileInputStream fileInputStreamReader = new FileInputStream(file);
	    byte[] bytes = new byte[(int) file.length()];
	    fileInputStreamReader.read(bytes);
	    encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
	} catch (IOException e) {
	    e.printStackTrace();
	}
	try {
	    if (restHighLevelClient != null) {
	        restHighLevelClient.close();
	    }
	} catch (final Exception e) {
	    System.out.println("Error closing ElasticSearch client: ");
	}
	try {
	    restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost("localhost", 9200, "http"),
	            new HttpHost("localhost", 9201, "http")));
	} catch (Exception e) {
	    System.out.println(e.getMessage());
	}
	
	
	Map<String, Object> jsonMap = new HashMap<>();
	jsonMap.put("Name", "samanvi");
	jsonMap.put("postDate", new Date());
	jsonMap.put("hra", encodedfile);
 

	IndexRequest request = new IndexRequest("index","_doc","56")
			.index("index")
			.source("field",jsonMap)
	        .setPipeline("samanvi");
	System.out.println("pipeline"+request.getPipeline());
	System.out.println("index"+request.index());
	try {
	    IndexResponse response = restHighLevelClient.index(request,RequestOptions.DEFAULT);
	} catch(ElasticsearchException | IOException e) {
	    if (((ElasticsearchException) e).status() == RestStatus.CONFLICT) {
	    }	
 }
	return "uploaded";
 }

So as per the above code when I hit the post url in postman it has to create the pipeline with name samanvi and store in ES

But GET call
http://localhost:9200/_ingest/pipeline/samanvi
returns that pipeline with id [samanvi] does not exist

My doubts?
1)I would like to know why pipeline is not created
2) how to check whether the pdf is stored in ES if pipeline is created

How did you define the pipeline?

Could you run from Kibana dev console:

GET /_ingest/samanvi

And share the output here?

This is how I defined the pipeline in my function

IndexRequest request = new IndexRequest("index","_doc","56")
			.index("index")
			.source("field",jsonMap)
	        .setPipeline("samanvi");

When I am running the below command from kibana console, the output is

GET/_ingest/samanvi 
{
  "error": "Incorrect HTTP method for uri [/_ingest/samanvi?pretty] and method [GET], allowed: [POST]",
  "status": 405
}

You did not create an ingest pipeline.
If you don't need an ingest pipeline (probably you don't know what it is), just remove .setPipeline("samanvi") from your code.

I would like to know how to create ingest pipeline by using the function.

An ingest pipeline which does what?

An ingest pipeline which store pdf files

Did you create it?

No That is the problem I am facing with. I would like to know how to create pipeline using java so that i can store the pdfs

Did you read the documentation here? https://www.elastic.co/guide/en/elasticsearch/plugins/7.6/ingest-attachment.html