How to get data more than 10000 in elasticsearch


(kanchan) #1

Hi Team,
I am trying to fetch data using rest client in java side, but not able to fetch more than 10000 and even if i am trying to fetch data less then 10000 like 5000 or 7000 it is taking too much time.
please let me know hoe we can achieve it.


(David Pilato) #2

You need to use the scroll API to extract data.


(kanchan) #3

I used scroll api that worked f9. Thanks!!!!!!!!!!!
I checked one example and follow that but now I want to use raw query.Like:

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest("entity_fact_test4");
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder qb = QueryBuilders.termQuery("client_id", "262");
searchSourceBuilder.size(100000);
searchSourceBuilder.query(qb);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restClient.search(searchRequest);

here currently i am using :
QueryBuilder qb = QueryBuilders.termQuery("client_id", "262");

but I want to replace this using raw or customize query like:

String queryProductElk ="{\r\n" +
" "query": { \r\n" +
" "bool": {\r\n" +
" "must": [\r\n" +
" {\r\n" +
" "term": {\r\n" +
" "client_id": {\r\n" +
" "value": "262"\r\n" +
" }\r\n" +
" }\r\n" +
" }\r\n" +
" ]\r\n" +
" }\r\n" +
" }\r\n" +
"}";
How could I achieve this.Please help me out.


(David Pilato) #4

That’s another question. Could you open a new discussion?

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

(kanchan) #5

Thanks dadoonet, It works fine.But performance is not good.For 4 lack records it is taking approx 2 min which is too much.please let me know if i need to do any configuration changes or required to use any api like bulk or any setting.


(David Pilato) #6

You meant for 4 documents?


(kanchan) #7

400000 records


(David Pilato) #8

What does a typical document looks like? What is its size?


(kanchan) #9

size parameter I set as 50000. and each document looks like this:

"_source": {
"product": "G10 Rates",
"time_id": 20121,
"wallet": 0.000057,
"entity_name": "Brigade Capital Management",
"country_hq": "USA",
"gap_tox": "1-3",
"entity_id": 208625,
"revenue": 0,
"gap": 0.00001,
"rank": "8+",
"region_hq": "Americas",
"sow": 0,
"region": "APAC",
"sector": "Hedge Fund Managers"
}


(David Pilato) #10

Can you try with less documents like 1000?
Also what is the exact scroll query you are running?

Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#sliced-scroll

Might help doing things in parallel.

Some hardware questions:

Do you have ssd drives?
Is there anything in logs like gc information?


(kanchan) #11

I used sliced scrolling but unable to connect transport client.my code is:

TransportClient client = new PreBuiltTransportClient(settings)
			.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("172.21.153.176"),9300));

Error:
Exception in thread "main" java.lang.AbstractMethodError: org.elasticsearch.transport.TcpTransport.connectToChannels(Lorg/elasticsearch/cluster/node/DiscoveryNode;Lorg/elasticsearch/transport/ConnectionProfile;Ljava/util/function/Consumer;)Lorg/elasticsearch/transport/TcpTransport$NodeChannels;


(David Pilato) #12

This is not related.

How did you connect previously?

The fact that will use slice scroll in the future is totally unrelated IMO.
Or I'm missing something in which case it could help if you share the full code and the full logs (stack trace).


(kanchan) #13

I used slice scroll api which works t9.Thanks!!! but performance is still slow.
Any suggestion to improve performance of scroll api.


(David Pilato) #14

What did you do at the end?

How many parallel scrolls did you run? And how?
What does "slow" mean?

Do you monitor elasticsearch and your application to understand where is the bottleneck?


(kanchan) #15

Hi, We are fetching 40000 documents from elasticsearch using scroll API. We have given 10 slices and scroll size is 4000. But slice API is taking more time to fetch all data from elasticsearch. After fetching all data from ES, we have java code to iterate data but that code is very faster. We need to fetch all data very fast including connection to elasticsearch. We have used Transport Client to connect to ES.

IntStream.range(0, slices).parallel().forEach(i -> {
SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource();
WrapperQueryBuilder qb = QueryBuilders.wrapperQuery(queryELKPart1);
searchSourceBuilder.query(qb);

			SliceBuilder sliceBuilder = new SliceBuilder(i, slices);
			SearchResponse response = transclient.prepareSearch("entity_fact").setTypes("logs").
					setSource(searchSourceBuilder).
					setScroll(scrollTimeout).
					slice(sliceBuilder).
					setSize(scrollSize).
					setFetchSource(reqFields, null).
					setExplain(false).
					get();
			List<String> r = Arrays.stream(response.getHits().getHits())
					.map(SearchHit::getSourceAsString).collect(Collectors.toList());
			dataCollectionList.add(r);
		} );

Above code we have implemented for slice API to fetch all data from ES. How can we increase the performance of fetching data. Please provide solution. We need to fetch bulk data in few milliseconds. How can we achieve this?


(David Pilato) #16

How long does it take?


(kanchan) #17

2 seconds we want in millisecond


(Christian Dahlqvist) #18

How much data do you have on the each node? How much RAM and heap do you have per node? What type of storage do you have?

If you want to reduce the response time as much as possible, you probably want to make sure that the full data set can fit in the operating system file cache. If this is not feasible, using SSDs if you are not already will probably help too.


(David Pilato) #19

2 seconds looks very good to me.
What kind of use case are you trying to solve here?


(kanchan) #20

In ignite it is taking 0.3 millisecond.so could I improve perfomance if I create replicas of my index in multiple nodes.