How to get data more than 10000 in elasticsearch

kanchan · November 16, 2017, 6:36am

Hi Team,
I am trying to fetch data using rest client in java side, but not able to fetch more than 10000 and even if i am trying to fetch data less then 10000 like 5000 or 7000 it is taking too much time.
please let me know hoe we can achieve it.

dadoonet · November 16, 2017, 7:43am

You need to use the scroll API to extract data.

kanchan · November 21, 2017, 7:29am

I used scroll api that worked f9. Thanks!!!!!!!!!!!
I checked one example and follow that but now I want to use raw query.Like:

final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest("entity_fact_test4");
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilder qb = QueryBuilders.termQuery("client_id", "262");
searchSourceBuilder.size(100000);
searchSourceBuilder.query(qb);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restClient.search(searchRequest);

here currently i am using :
QueryBuilder qb = QueryBuilders.termQuery("client_id", "262");

but I want to replace this using raw or customize query like:

String queryProductElk ="{\r\n" +
" "query": { \r\n" +
" "bool": {\r\n" +
" "must": [\r\n" +
" {\r\n" +
" "term": {\r\n" +
" "client_id": {\r\n" +
" "value": "262"\r\n" +
" }\r\n" +
" }\r\n" +
" }\r\n" +
" ]\r\n" +
" }\r\n" +
" }\r\n" +
"}";
How could I achieve this.Please help me out.

dadoonet · November 21, 2017, 7:49am

That’s another question. Could you open a new discussion?

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

kanchan · December 7, 2017, 7:30am

Thanks dadoonet, It works fine.But performance is not good.For 4 lack records it is taking approx 2 min which is too much.please let me know if i need to do any configuration changes or required to use any api like bulk or any setting.

dadoonet · December 7, 2017, 8:41am

You meant for 4 documents?

kanchan · December 7, 2017, 9:48am

400000 records

dadoonet · December 7, 2017, 10:07am

What does a typical document looks like? What is its size?

kanchan · December 7, 2017, 10:21am

size parameter I set as 50000. and each document looks like this:

"_source": {
"product": "G10 Rates",
"time_id": 20121,
"wallet": 0.000057,
"entity_name": "Brigade Capital Management",
"country_hq": "USA",
"gap_tox": "1-3",
"entity_id": 208625,
"revenue": 0,
"gap": 0.00001,
"rank": "8+",
"region_hq": "Americas",
"sow": 0,
"region": "APAC",
"sector": "Hedge Fund Managers"
}

dadoonet · December 7, 2017, 10:39am

Can you try with less documents like 1000?
Also what is the exact scroll query you are running?

Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#sliced-scroll

Might help doing things in parallel.

Some hardware questions:

Do you have ssd drives?
Is there anything in logs like gc information?

kanchan · December 8, 2017, 7:38am

I used sliced scrolling but unable to connect transport client.my code is:

TransportClient client = new PreBuiltTransportClient(settings)
			.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("172.21.153.176"),9300));

Error:
Exception in thread "main" java.lang.AbstractMethodError: org.elasticsearch.transport.TcpTransport.connectToChannels(Lorg/elasticsearch/cluster/node/DiscoveryNode;Lorg/elasticsearch/transport/ConnectionProfile;Ljava/util/function/Consumer;)Lorg/elasticsearch/transport/TcpTransport$NodeChannels;

dadoonet · December 8, 2017, 8:28am

This is not related.

How did you connect previously?

The fact that will use slice scroll in the future is totally unrelated IMO.
Or I'm missing something in which case it could help if you share the full code and the full logs (stack trace).

kanchan · December 11, 2017, 11:35am

I used slice scroll api which works t9.Thanks!!! but performance is still slow.
Any suggestion to improve performance of scroll api.

dadoonet · December 11, 2017, 2:25pm

What did you do at the end?

How many parallel scrolls did you run? And how?
What does "slow" mean?

Do you monitor elasticsearch and your application to understand where is the bottleneck?

kanchan · December 15, 2017, 6:04am

Hi, We are fetching 40000 documents from elasticsearch using scroll API. We have given 10 slices and scroll size is 4000. But slice API is taking more time to fetch all data from elasticsearch. After fetching all data from ES, we have java code to iterate data but that code is very faster. We need to fetch all data very fast including connection to elasticsearch. We have used Transport Client to connect to ES.

IntStream.range(0, slices).parallel().forEach(i -> {
SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource();
WrapperQueryBuilder qb = QueryBuilders.wrapperQuery(queryELKPart1);
searchSourceBuilder.query(qb);

			SliceBuilder sliceBuilder = new SliceBuilder(i, slices);
			SearchResponse response = transclient.prepareSearch("entity_fact").setTypes("logs").
					setSource(searchSourceBuilder).
					setScroll(scrollTimeout).
					slice(sliceBuilder).
					setSize(scrollSize).
					setFetchSource(reqFields, null).
					setExplain(false).
					get();
			List<String> r = Arrays.stream(response.getHits().getHits())
					.map(SearchHit::getSourceAsString).collect(Collectors.toList());
			dataCollectionList.add(r);
		} );

Above code we have implemented for slice API to fetch all data from ES. How can we increase the performance of fetching data. Please provide solution. We need to fetch bulk data in few milliseconds. How can we achieve this?

dadoonet · December 15, 2017, 8:48am

How long does it take?

kanchan · December 15, 2017, 8:53am

2 seconds we want in millisecond

Christian_Dahlqvist · December 15, 2017, 8:57am

How much data do you have on the each node? How much RAM and heap do you have per node? What type of storage do you have?

If you want to reduce the response time as much as possible, you probably want to make sure that the full data set can fit in the operating system file cache. If this is not feasible, using SSDs if you are not already will probably help too.

dadoonet · December 15, 2017, 9:06am

2 seconds looks very good to me.
What kind of use case are you trying to solve here?

kanchan · December 15, 2017, 10:29am

In ignite it is taking 0.3 millisecond.so could I improve perfomance if I create replicas of my index in multiple nodes.

Topic		Replies	Views
ElasticSearch and RestHighLevelClient and fetching more than 10000 items Elasticsearch	6	1679	June 8, 2020
C# NEST Elastic Client retreive more than 10000 data Problem Elasticsearch	1	923	April 2, 2018
Get bulk data from elasticsearch Elasticsearch	4	792	January 4, 2018
Elasticsearch SearchSourceBuilder.size Elasticsearch	8	6369	May 4, 2019
How to retrieve more than 10,000 results Elasticsearch	7	3118	August 16, 2018

How to get data more than 10000 in elasticsearch

Related topics