How to push large excel data to elasticsearch using bulk API


(ajit) #1

I am reading excel sheet using java and aspose and I am using bulk API to push data to elasticsearch but it is taking more time to create index in elasticsearch and pushing data to ES. My Excel file contains 50000 records.
Please provide solution.

Below is my code.

Settings settings = Settings.builder()
.put("cluster.name", "elasticsearchtest").build();
TransportClient client = new PreBuiltTransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("172.21.153.176"),9300));
BulkRequestBuilder brb = client.prepareBulk();

FileInputStream file = new FileInputStream(new File("C:\Users\cajitb\Desktop\TempFiles\op\TestExcel.xlsx"));
Workbook workbook = new Workbook(file);
Worksheet worksheet = workbook.getWorksheets().get(0);
Cells cells = worksheet.getCells();

	Range range = cells.getMaxDisplayRange();
	int totalColumns = range.getColumnCount();
	int totalRows = range.getRowCount();
	RowCollection rows = cells.getRows();
    
	int maxColumn = cells.getMaxDataColumn();
	int maxRows = cells.getMaxDataRow();
	System.out.println("Max Col="+maxColumn);
	System.out.println("Max Row="+maxRows);
	System.out.println("Total Cols="+totalColumns);
	System.out.println("Total Rows="+totalRows);
	
	Date d1 = new Date();
	System.out.println("Start Time:"+dateFormat.format(d1));
	
	List<Object> colList = new ArrayList<Object>();
	for(int i = 0; i<=0 ; i++)
	{
		for (int j = 0; j < totalColumns; j++) 
	    {
			colList.add(cells.get(i, j).getValue());
	    }
	}
	
	System.out.println("Col List ="+colList);
	
	for (int i = 1; i < rows.getCount(); i++) 
	{
		
	    for (int j = 0; j < totalColumns; j++) 
	    {
	    	String key = (String)colList.get(j); 
	    	if(null==key)
	    		key = "";
	    	hm.put(key, cells.get(i, j).getValue());
	        System.out.print(cells.get(i, j).getValue() + "\t");
	    }
	   
	    hm.put("Action_Date", dateFormat.format(new Date()));
	    brb.add(client.prepareIndex("xxxxindex", "xxxxtype").setSource(hm)); 
	    
	    System.out.println("");
	}

	Date d2 = new Date();
	System.out.println("End Time:"+dateFormat.format(d2));
	System.out.println("Col List ="+colList);
	
	BulkResponse response = brb.execute().actionGet(); 
    if(response.hasFailures()) { 
      System.err.println(response.buildFailureMessage()); 
    } else { 
      System.out.println("Bulk indexing succeeded."+response.status()); 
    }

(David Pilato) #2

Is it somehow the same thread as:

?

I'd suggest keeping the discussion in only one thread instead so we can follow or at least link to the previous post if this one is closed.

If it's totally something different then having a new question is fine.

BTW could you please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

May be 50000 items is too much per bulk operation?
You can try to run a bulk API call every 5000 docs for example.

Have a look at https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-bulk.html#java-rest-high-document-bulk-processor

This will simplify a lot your code IMO.

BTW system.out might slow down a lot your code.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.