We are starting to use the Java API in ElasticSearch. The only problem is
that the queries seem to take much longer to retrieve data than simply
using curl.
Our development server(19.08) has very small index (2000 documents, with 8
fields)
When making a call to retrieve ~1200 documents it takes 17 seconds to run a
query
versus < 1 second to get the same result using curl
Here is the code I am using to test ES
LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on
%d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();
BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));
long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/* To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));
I wonder if you could point out what I am doing wrong?
On Tuesday, November 13, 2012 8:30:53 PM UTC+1, DMAC wrote:
Hi
We are starting to use the Java API in Elasticsearch. The only problem is
that the queries seem to take much longer to retrieve data than simply
using curl.
Our development server(19.08) has very small index (2000 documents, with 8
fields)
When making a call to retrieve ~1200 documents it takes 17 seconds to run
a query
versus < 1 second to get the same result using curl
Here is the code I am using to test ES
LOG.info(String.format("Initializing connection to Elasticsearch %s/%s on
%d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();
BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));
long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/* To Here takes 17 seconds in Java. */
SearchHit docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));
I wonder if you could point out what I am doing wrong?
I'm not sure what limit is (1200)? but returning that many values
(versus returning the default of 10) makes a big difference
Are you doing exactly the same search in the REST call (e.g.
DFS_QUERY_THEN_SEARCH search type, num results etc)?
We have done lots of testing with both http/rest with lots of search
types/limits and i don't think i've every seen such a difference (or
anything near that) in terms of timings. (using ES 0.19.9 over a multi-node
cluster with millions of docs)
On Tuesday, 13 November 2012 19:30:53 UTC, DMAC wrote:
Hi
We are starting to use the Java API in Elasticsearch. The only problem is
that the queries seem to take much longer to retrieve data than simply
using curl.
Our development server(19.08) has very small index (2000 documents, with 8
fields)
When making a call to retrieve ~1200 documents it takes 17 seconds to run
a query
versus < 1 second to get the same result using curl
Here is the code I am using to test ES
LOG.info(String.format("Initializing connection to Elasticsearch %s/%s on
%d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();
BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));
long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/* To Here takes 17 seconds in Java. */
SearchHit docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));
I wonder if you could point out what I am doing wrong?
Thanks. Sorry for the slow response. It turns out that it was my fault, it was the way I was serialising the data.
Regards
D.
On 14 Nov 2012, at 08:41, Derry O' Sullivan wrote:
2 other points on this.
I'm not sure what limit is (1200)? but returning that many values (versus returning the default of 10) makes a big difference
Are you doing exactly the same search in the REST call (e.g. DFS_QUERY_THEN_SEARCH search type, num results etc)?
We have done lots of testing with both http/rest with lots of search types/limits and i don't think i've every seen such a difference (or anything near that) in terms of timings. (using ES 0.19.9 over a multi-node cluster with millions of docs)
On Tuesday, 13 November 2012 19:30:53 UTC, DMAC wrote:
Hi
We are starting to use the Java API in Elasticsearch. The only problem is that the queries seem to take much longer to retrieve data than simply using curl.
Our development server(19.08) has very small index (2000 documents, with 8 fields)
When making a call to retrieve ~1200 documents it takes 17 seconds to run a query
versus < 1 second to get the same result using curl
Here is the code I am using to test ES
LOG.info(String.format("Initializing connection to Elasticsearch %s/%s on %d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();
BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));
long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/* To Here takes 17 seconds in Java. */
SearchHit docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));
I wonder if you could point out what I am doing wrong?
I am facing the same issue . When I use curl to get response , it takes around 40-50 ms . But When same query is executed using execute().actionGet() using TransportClient , it takes around 1000 ms. Trying to figure out the root cause and the solution of this problem . Could anybody pls help me out with this
You should better open a new thread and describe exactly what you are doing and in which context (version).
I'm fairly sure you are not doing the exact same thing.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.