2 Node cluster hanging while bulk indexing and adding types

Greg_Brown · January 11, 2012, 4:35pm

Ah, I see what you are saying. But I am totally flummoxed as to how to
formulate the query to get a filtered query that matches the
performance of the default filtered query used for the type.

I think what you are suggesting is:
curl -XGET "${SERVER}/pd-test-0/_search?pretty"
-d '{
"size" : 0,
"query" : {
"filtered" : {
"query" : { "match_all" : { } },
"filter" : {
"term" : { "pd_id" : "$ID" }
}
} },
"facets" : {
"q1" : {
"terms" : {
"field" : "q1",
"size" : 100
}
}
}
}'

Is that query match_all correct? This query takes about 125 ms vs 13
ms for a query on the type (curl -XGET "${SERVER}/pd-0/${ID}/_search).

From what I can gather from the Java code this should mostly match
that query, but don't know the code well enough. Is there an easy way
to enable logging that would let me compare the structure of the
parsed queries for debugging this?

Thanks for all the help, Shay! Much appreciated.
-Greg

On Jan 10, 10:50 am, Shay Banon kim...@gmail.com wrote:

10x slower than types? It makes little sense since types, at teh end of the
day, is just a field called _type in a document, and when you search within
a type, your query provided is simply wrapped in a filtered query a filter
on the type. So, you can do it yourself, just wrap your query in a filtered
query with a filter on your "type".

On Tue, Jan 10, 2012 at 5:48 PM, Greg Ichneumon Brown
gbrown5...@gmail.comwrote:

That's what I did. Functionally works, but it is 10x slower to query
using either a query to filter or a facet_filter. Is there another
way? According to the docs: "search filters restrict only returned
documents — but not facet counts"

On Jan 10, 2:40 am, Shay Banon kim...@gmail.com wrote:

Just add the "type" as a field to the doc, and filter by it.

On Tue, Jan 10, 2012 at 5:45 AM, Greg Brown gbrown5...@gmail.com
wrote:

Indexing all data to a single type did work fine (3.3mil docs) as
expected.

I submitted a bug (elasticsearch · GitHub
elasticsearch.github.com/issues/134) on the large number of types
because I was able to get the server to become unresponsive even when
there was only a single server and I tried to add many types.

For the moment I am going ahead with using all of the documents in a
single index. However, this significantly reduces query performance
compared to having a separate type for each set of documents. I looped
and profiled the following queries on the larger sets of documents
(10k-70k):https://gist.github.com/1586723Thiswas all run on a
single server, and the query from a different machine.

The first query has each set of docs in its own Type. On average it
took about 11 ms to complete.

The second has all of the docs in one index with a field pd_id to
distinguish the sets. The query uses the facet_filter and average ~190
ms.

The third uses the same index as the second, but uses a query to do
the "filtering" of the docs. ~140 ms. I was surprised that this was
faster than the facet_filter.

Any suggestions on how to improve the last two queries?

Any ideas on how to create multiple types without creating 10k
separate indices. In this case all I am using the Type for is a
partitioning/grouping of multiple separate indices, since the Mapping
of each Type is identical.

Thanks for the help.
-Greg

On Dec 22 2011, 7:37 am, Greg Brown gbrown5...@gmail.com wrote:

Checking through the logs, there isn't any mention of there not being
enough file handles, the errors I am running into are out of memory
on
the heap space errors.

Shay,

Thanks, will give that a try and let you know. Will have to wait
until
after the weekend so I can set up a development cluster. I've brought
down the production cluster a few too many times this week, and its
time to be more careful.

Thanks for the fast responses.
-Greg

On Dec 21, 6:54 pm, Shay Banon kim...@gmail.com wrote:

My guess is that the problem is with creating so many types, which
ends up
being a large overhead in the system. Each time a type is
introduced,
it
needs to be broadcasted to the rest of the nodes and persisted as
part
of
the cluster meta data. Can you try just indexing into the same
type as
a
test and see if it still happens?

On Wed, Dec 21, 2011 at 9:08 PM, Karussell <
tableyourt...@googlemail.com>wrote:

Greg, have you checked/increased open file handle limits for
your
machine?

First, check/post your logs. If too many files open ES would log
that.

Peter.

Topic		Replies	Views
Multiple Types within an Index Elasticsearch	14	763	July 6, 2017
More indices vs. more types Elasticsearch	9	568	July 6, 2017
ElasticSearch _type performance Elasticsearch	7	406	July 6, 2017
Recommended maximum fields per index Elasticsearch	10	6484	July 6, 2017
Re: Abridged summary of elasticsearch@googlegroups.com - 88 Messages in 41 Topics Elasticsearch	1	601	July 6, 2017

2 Node cluster hanging while bulk indexing and adding types

Related topics