4-5 second query time. Only 50 documents. Need help


(project2501) #1

Hi,
I'm running ES 0.90 on a very big EC2 instance. Ubuntu 13.04 64bit.

/home/ubuntu/installs/jdk1.6.0_45/bin/java -Xms4000m -Xmx4000m -Xss256k
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.path.home=/home/ubuntu/installs/elasticsearch-0.90.6 -cp
:/home/ubuntu/installs/elasticsearch-0.90.6/lib/elasticsearch-0.90.6.jar:/home/ubuntu/installs/elasticsearch-0.90.6/lib/:/home/ubuntu/installs/elasticsearch-0.90.6/lib/sigar/
org.elasticsearch.bootstrap.ElasticSearch

I have only 50 documents in an index yet every query takes 3-6 seconds to
run. Here is a sample query.

curl -X POST "http://localhost:9200/documents/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "(text:"understanding" ) "
}
},
"fields": [
"id",
"ratings"
]
} '

Why does it take so long to query? I have 1 shard and 1 replica. I thought
ES was supposed to be fast?

Any ideas here? This is pretty disappointing.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c5b4f1e-7e04-4a04-ad44-c3a58c5b7c60%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

There can be lots of reasons - EC2-related, OS related, Java related,
cluster setup related, index related, query related...

Can you give an example of your mapping and a document you have indexed?

How much RAM is your EC2 instance? Do you use hardware virtualization? Did
you disable swap and enable mlock?

What is the cluster health, is it green? If you have just one node, why is
there 1 replica? It makes not much sense.

Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90
branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java 7,
especially if you use OpenJDK 6.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE%2B_WOkb4X_itTEYeLpMsuT4sk7XLthh0pQaws%2BvsqNRA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(project2501) #3

Hi,
Thanks for the response.

The box has 15GB RAM. 4GB allocated to ES.

The mapping is simple and has only about 6 not_analyzed fields, 1 date
field and 1 text field.

The documents are large however, 10MB each with 100's of fields. Only a
couple are being returned and the response documents are less than 10k
each (only two two small fields returned).

Sun JDK 1.6

I will try Oracle JDK 1.7 and latest ES.

I was trying different things with node/replica to see if there is a change
in performance.
Cluster health is green.

On Wednesday, January 1, 2014 12:30:49 PM UTC-5, Jörg Prante wrote:

There can be lots of reasons - EC2-related, OS related, Java related,
cluster setup related, index related, query related...

Can you give an example of your mapping and a document you have indexed?

How much RAM is your EC2 instance? Do you use hardware virtualization? Did
you disable swap and enable mlock?

What is the cluster health, is it green? If you have just one node, why is
there 1 replica? It makes not much sense.

Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90
branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java
7, especially if you use OpenJDK 6.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/33cd7099-b336-4fef-a259-64c29a4d8d88%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

10MB are very large for a single document. Have you disabled _source and
_all field in the mapping?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFjDTAdeh3FdEw7u1B4N6gsrL78GvP%3DBCJ-3WAU6Mt7oA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(project2501) #5

Correction, health status is 'yellow'. Only one node.

On Wednesday, January 1, 2014 12:44:46 PM UTC-5, project2501 wrote:

Hi,
Thanks for the response.

The box has 15GB RAM. 4GB allocated to ES.

The mapping is simple and has only about 6 not_analyzed fields, 1 date
field and 1 text field.

The documents are large however, 10MB each with 100's of fields. Only a
couple are being returned and the response documents are less than 10k
each (only two two small fields returned).

Sun JDK 1.6

I will try Oracle JDK 1.7 and latest ES.

I was trying different things with node/replica to see if there is a
change in performance.
Cluster health is green.

On Wednesday, January 1, 2014 12:30:49 PM UTC-5, Jörg Prante wrote:

There can be lots of reasons - EC2-related, OS related, Java related,
cluster setup related, index related, query related...

Can you give an example of your mapping and a document you have indexed?

How much RAM is your EC2 instance? Do you use hardware virtualization?
Did you disable swap and enable mlock?

What is the cluster health, is it green? If you have just one node, why
is there 1 replica? It makes not much sense.

Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90
branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java
7, especially if you use OpenJDK 6.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/883e7933-3248-49ec-828c-c6bcc89c2a69%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(project2501) #6

I moved to ES 0.90.9 and JDK1.7. Still slower than tar.

curl -X POST "http://localhost:9200/documents/_search?pretty=true" -d '
{
"query": {
"query_string": {
"query": "(text:"understanding" ) "
}
},
"fields": [
"id",
"ratings"
]
} '

Here's a result. See how long it takes? That's obscene for 50 documents.

{
"took" : 4294,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 11,
"max_score" : 0.048666354,
"hits" : [ {
"_index" : "documents",
"_type" : "doc",
"_id" : "eb7b1c4a-b8c9-4bdc-8baf-aa16002831f3",
"_score" : 0.048666354,
"fields" : {
"id" : "eb7b1c4a-b8c9-4bdc-8baf-aa16002831f3",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "c5ce2faf-db3d-4623-9ecc-398f4a78f123",
"_score" : 0.030950457,
"fields" : {
"id" : "c5ce2faf-db3d-4623-9ecc-398f4a78f123",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "87666287-6b01-4c99-8c8e-430f3417014b",
"_score" : 0.030665938,
"fields" : {
"id" : "87666287-6b01-4c99-8c8e-430f3417014b",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "f10c9a47-7efa-424d-bf65-7072bbbf64be",
"_score" : 0.028295785,
"fields" : {
"id" : "f10c9a47-7efa-424d-bf65-7072bbbf64be",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "b1d2f103-d2c6-4b4e-8fd2-8615b9ed6bb0",
"_score" : 0.02708165,
"fields" : {
"id" : "b1d2f103-d2c6-4b4e-8fd2-8615b9ed6bb0",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "6aa4b05b-ee38-49b4-a3e2-1f2a01e62f40",
"_score" : 0.025010176,
"fields" : {
"id" : "6aa4b05b-ee38-49b4-a3e2-1f2a01e62f40",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "78d5a3d7-4c45-4f1b-a596-993916aed2de",
"_score" : 0.02360665,
"fields" : {
"id" : "78d5a3d7-4c45-4f1b-a596-993916aed2de",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "21be8c9e-3e73-44e8-9b31-db44fa0a4faa",
"_score" : 0.020865528,
"fields" : {
"id" : "21be8c9e-3e73-44e8-9b31-db44fa0a4faa",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "1be7eb7e-dd8b-4582-aa0d-2edb20fa4339",
"_score" : 0.020655818,
"fields" : {
"id" : "1be7eb7e-dd8b-4582-aa0d-2edb20fa4339",
"ratings" : [ ]
}
}, {
"_index" : "documents",
"_type" : "doc",
"_id" : "a695fdd5-d025-4f77-8b25-c7fdea8bbbe3",
"_score" : 0.017684866,
"fields" : {
"id" : "a695fdd5-d025-4f77-8b25-c7fdea8bbbe3",
"ratings" : [ ]
}
} ]
}
}

On Wednesday, January 1, 2014 12:44:46 PM UTC-5, project2501 wrote:

Hi,
Thanks for the response.

The box has 15GB RAM. 4GB allocated to ES.

The mapping is simple and has only about 6 not_analyzed fields, 1 date
field and 1 text field.

The documents are large however, 10MB each with 100's of fields. Only a
couple are being returned and the response documents are less than 10k
each (only two two small fields returned).

Sun JDK 1.6

I will try Oracle JDK 1.7 and latest ES.

I was trying different things with node/replica to see if there is a
change in performance.
Cluster health is green.

On Wednesday, January 1, 2014 12:30:49 PM UTC-5, Jörg Prante wrote:

There can be lots of reasons - EC2-related, OS related, Java related,
cluster setup related, index related, query related...

Can you give an example of your mapping and a document you have indexed?

How much RAM is your EC2 instance? Do you use hardware virtualization?
Did you disable swap and enable mlock?

What is the cluster health, is it green? If you have just one node, why
is there 1 replica? It makes not much sense.

Side notes: Please use ES version 0.90.9 as it is the latest of the 0.90
branch with bugs fixed. And what vendor is jdk1.6.0_45? Please use Java
7, especially if you use OpenJDK 6.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/201a4ca6-bc05-48f4-85ff-d3d2e8b152cc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(project2501) #7

I haven't disabled those fields, but since I query only select fields,
would it matter to disable those?
I do a lot of query_string queries with highlights and facets, but at the
moment, even the simplest query is dog slow.

On Wednesday, January 1, 2014 12:57:56 PM UTC-5, Jörg Prante wrote:

10MB are very large for a single document. Have you disabled _source and
_all field in the mapping?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/62e0198c-dd6f-4482-bc3b-bc696401c4e8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(project2501) #8

Here is my mapping.

        mapping = {'doc': {'properties': 

{'ngrams':{'index':'not_analyzed','type':'string'},"dates": {"type" :
"date", "format" :
"yyyy-MM-dd"},'locations':{'index':'not_analyzed','type':'string'},'concept':{'index':'not_analyzed','type':'string'},
'entities.currencies': {'index':'not_analyzed', 'type':'string' },
'entities.actions': {'index':'not_analyzed','type':'string' },
'entities.things': {'index':'not_analyzed','type':'string' },
'entities.places': {'index':'not_analyzed','type':'string' },
'entities.people':
{'search_analyzer':'simple','index_analyzer':'simple','type':'string' },
'entities.dates': {'index':'not_analyzed', 'type':'string' }, 'text': {
"analyzer":"standard", "term_vector":"yes", 'type':'string','term_vector' :
'with_positions_offsets'} ,'location': {'type': 'geo_point', 'store':
'yes'},'concepts':{'type':'string', 'store':'no'}}}}

The result of the document fields are dynamically mapped, strings.

On Wednesday, January 1, 2014 1:43:29 PM UTC-5, project2501 wrote:

I haven't disabled those fields, but since I query only select fields,
would it matter to disable those?
I do a lot of query_string queries with highlights and facets, but at the
moment, even the simplest query is dog slow.

On Wednesday, January 1, 2014 12:57:56 PM UTC-5, Jörg Prante wrote:

10MB are very large for a single document. Have you disabled _source and
_all field in the mapping?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53cd966a-294c-4679-97a0-f823b93da701%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #9

You requested the fields "id" and "ratings", but you did not declare them
in the mapping.

Because of this, ES is extracting them from source, it means ES loads all
the 10MB sized docs in an extra step to extract the fields from the
_source. This process surely takes several seconds for the result set you
showed above.

To improve this, declare "id" and "ratings" as fields in the mapping with
attribute "store" set to yes. And do not forget to disable _source and
_all, if you only want to search on fields "id" and "ratings". This will
save a lot of resources in the index.

Hint: in the results, the field _id is already delivered. No need to double
this information in another field "id".

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3mRJkroeuzT9U%2BRv9np%3DH%3DCisF3PARzAJx7nj6ktBeA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(project2501) #10

Thanks so much Jörg. I appreciate the tip and will try it out.

On Wednesday, January 1, 2014 4:16:50 PM UTC-5, Jörg Prante wrote:

You requested the fields "id" and "ratings", but you did not declare them
in the mapping.

Because of this, ES is extracting them from source, it means ES loads all
the 10MB sized docs in an extra step to extract the fields from the
_source. This process surely takes several seconds for the result set you
showed above.

To improve this, declare "id" and "ratings" as fields in the mapping with
attribute "store" set to yes. And do not forget to disable _source and
_all, if you only want to search on fields "id" and "ratings". This will
save a lot of resources in the index.

Hint: in the results, the field _id is already delivered. No need to
double this information in another field "id".

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cd5d20b-998c-41c8-84cf-fe34bbbc8303%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #11

I agree with Jorg that the size of your document is perhaps the primary
reason for the slowness. Larger documents is one of the drawbacks of
denormalization.

Just to highlight some other inconsistencies:

  1. You stated that you have 1 replica, but also that you have only one
    node. Elasticsearch will only assign a replica to another node. A single
    node will not contain two identical shards. Your cluster will be in a
    permanent yellow state until you set replicas to 0.

  2. If your documents are truly that larger, you should disable the _all
    field [1].

  3. All your fields are indexed, even the ones you want to retrieve.
    Hopefully you did not confuse not_analyzed with not indexed. Try to
    streamline your documents as much as possible, preventing fields that are
    not queried to be indexed and loaded into the field cache.

None of these affect your speed issue, but are things that possibly should
be addressed.

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html

Cheers,

Ivan

On Wed, Jan 1, 2014 at 6:46 PM, project2501 darreng5150@gmail.com wrote:

Thanks so much Jörg. I appreciate the tip and will try it out.

On Wednesday, January 1, 2014 4:16:50 PM UTC-5, Jörg Prante wrote:

You requested the fields "id" and "ratings", but you did not declare them
in the mapping.

Because of this, ES is extracting them from source, it means ES loads all
the 10MB sized docs in an extra step to extract the fields from the
_source. This process surely takes several seconds for the result set you
showed above.

To improve this, declare "id" and "ratings" as fields in the mapping with
attribute "store" set to yes. And do not forget to disable _source and
_all, if you only want to search on fields "id" and "ratings". This will
save a lot of resources in the index.

Hint: in the results, the field _id is already delivered. No need to
double this information in another field "id".

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2cd5d20b-998c-41c8-84cf-fe34bbbc8303%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBe7LDk1L7169HDV1XG7HVUBNU%3DftVir%3D720z9TmWoMmw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #12