Elasticsearch Java client much slower than rest call


(DMAC) #1

Hi

We are starting to use the Java API in ElasticSearch. The only problem is
that the queries seem to take much longer to retrieve data than simply
using curl.

Our development server(19.08) has very small index (2000 documents, with 8
fields)

When making a call to retrieve ~1200 documents it takes 17 seconds to run a
query
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on
%d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/
* To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance

D.

--


(Jörg Prante) #2

Switch off explain, setExplain(false)

Unfortunately, this is in the
docs http://www.elasticsearch.org/guide/reference/java-api/search.html but
it's not the default, only an optional setting.

Best regards,

Jörg

On Tuesday, November 13, 2012 8:30:53 PM UTC+1, DMAC wrote:

Hi

We are starting to use the Java API in ElasticSearch. The only problem is
that the queries seem to take much longer to retrieve data than simply
using curl.

Our development server(19.08) has very small index (2000 documents, with 8
fields)

When making a call to retrieve ~1200 documents it takes 17 seconds to run
a query
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on
%d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/
* To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance

D.

--


(Derry O' Sullivan) #3

2 other points on this.

  1. I'm not sure what limit is (1200)? but returning that many values
    (versus returning the default of 10) makes a big difference
  2. Are you doing exactly the same search in the REST call (e.g.
    DFS_QUERY_THEN_SEARCH search type, num results etc)?

We have done lots of testing with both http/rest with lots of search
types/limits and i don't think i've every seen such a difference (or
anything near that) in terms of timings. (using ES 0.19.9 over a multi-node
cluster with millions of docs)

On Tuesday, 13 November 2012 19:30:53 UTC, DMAC wrote:

Hi

We are starting to use the Java API in ElasticSearch. The only problem is
that the queries seem to take much longer to retrieve data than simply
using curl.

Our development server(19.08) has very small index (2000 documents, with 8
fields)

When making a call to retrieve ~1200 documents it takes 17 seconds to run
a query
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on
%d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/
* To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance

D.

--


(DMAC) #4

Hi,

Thanks. Sorry for the slow response. It turns out that it was my fault, it was the way I was serialising the data.

Regards

D.

On 14 Nov 2012, at 08:41, Derry O' Sullivan wrote:

2 other points on this.

  1. I'm not sure what limit is (1200)? but returning that many values (versus returning the default of 10) makes a big difference
  2. Are you doing exactly the same search in the REST call (e.g. DFS_QUERY_THEN_SEARCH search type, num results etc)?

We have done lots of testing with both http/rest with lots of search types/limits and i don't think i've every seen such a difference (or anything near that) in terms of timings. (using ES 0.19.9 over a multi-node cluster with millions of docs)

On Tuesday, 13 November 2012 19:30:53 UTC, DMAC wrote:
Hi

We are starting to use the Java API in ElasticSearch. The only problem is that the queries seem to take much longer to retrieve data than simply using curl.

Our development server(19.08) has very small index (2000 documents, with 8 fields)

When making a call to retrieve ~1200 documents it takes 17 seconds to run a query
versus < 1 second to get the same result using curl

Here is the code I am using to test ES

LOG.info(String.format("Initializing connection to ElasticSearch %s/%s on %d", host, clusterName, port));
settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff",true).build();
client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(host, port));
searchRequest = client.prepareSearch(clusterName);
mapper = new SerObjectMapper();

BoolQueryBuilder query = boolQuery();
for(String term : keywordList)
query.should(fieldQuery("body", term));

long start = System.currentTimeMillis();
/** From here /
SearchResponse response = client.prepareSearch(clusterName)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(query).setFrom(0).setSize(limit).setExplain(true)
.execute()
.actionGet();
/
* To Here takes 17 seconds in Java. */
SearchHit[] docs = response.getHits().getHits();
System.err.println("Query took " + (System.currentTimeMillis()-start));
for(SearchHit doc : docs)
urlList.add((String)doc.getSource().get("url"));
LOG.info(String.format("Returning %d results", urlList.size()));

I wonder if you could point out what I am doing wrong?

Thanks in advance

D.

--

--


#5

I am facing the same issue . When I use curl to get response , it takes around 40-50 ms . But When same query is executed using execute().actionGet() using TransportClient , it takes around 1000 ms. Trying to figure out the root cause and the solution of this problem . Could anybody pls help me out with this


(David Pilato) #6

You should better open a new thread and describe exactly what you are doing and in which context (version).
I'm fairly sure you are not doing the exact same thing.


#7

Thanks for the reply.
Open new thread as Java Transport Client slower than curl request


(system) #8