Facet and query - CPU load 100% (local node)


(phoenix) #1

Hi all,

I'm currently in a development phase, so i'm testing my devs on my local
machine (a mac book pro with 16Gb ram, and a 512G SSD, 4 cores).

Processor Name: Intel Core i7

Processor Speed: 2,3 GHz

Number of Processors: 1

Total Number of Cores: 4

L2 Cache (per Core): 256 KB

L3 Cache: 6 MB

Memory: 16 GB

I set up a one node / one shard / no replica ES node where i index approx
264k documents in bulk mode (takes 16s including preparation time).

Then on my application i see a table with my data, and i can create facets
on columns.

When i have more than one facet, it begins to become difficult for ES.

Checking one facet value updates the table data (filtering on the selected
facet value), and updates the other facet(s), to get facet values on
filtered data only.

This is when i select one facet value that ES begins to use 100 % cpu (i
even saw 220% in top command result), and i don't really know why.
Basically i send 2 queries, one faceting query with filtering the other is
the table data query with filters.

Do you have any idea what could cause the high cpu load on ES local node ?
(For info i gave ES node 5g of memory (Xmx and Xms)).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ffd251a8-08c7-45c2-8436-e7ca86354bc8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(phoenix) #2

Actually it seems it's more when i DESELECT a facet value that performance
slows down (and load increases). Because i facet and search on all the docs
(if i remove all filters).
The more facets and selections i make to restrict my table data, the more
efficient queries become.

But i'm worried about the fact that on a local dev computer, one node with
one shard and no replica, with only 265k docs, search/facetting is so
slow...
Any clue about what i could be doing wrong to get cpu load to 100 % ?

I was doing like this :

    TermsFacetBuilder facet = FacetBuilders.termsFacet(facetName)
            .field(columnName)
            .size(1000);
    SearchRequestBuilder query = client.prepareSearch(datasetName)
            .setTypes(RECORD);
    if (filters != null && filters.size() > 0) {

        BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
        for (ColumnFilterConfiguration filter : filters) {
            if (filter instanceof TextFilterConfiguration) {
                TextFilterConfiguration textFilter = 

(TextFilterConfiguration) filter;
final String textFilterValue =
StringUtils.trimToNull(textFilter.getTextFilterValue());

queryBuilder.must(QueryBuilders.termQuery(filter.getColumn().getNormalizedName(),
textFilterValue));
} else {
throw new UnsupportedOperationException("Unsupporterd
filter type.");
}
}
query.setQuery(queryBuilder);
} else {
query.setQuery(QueryBuilders.matchAllQuery());
}
query.addFacet(facet);
SearchResponse sr = query.execute().actionGet();
TermsFacet f = (TermsFacet)
sr.getFacets().facetsAsMap().get(facetName);

Now i switched to aggregations, but the result seems to be the same :

    TermsBuilder agg = 

AggregationBuilders.terms(facetName).field(columnName).size(1000);
SearchRequestBuilder query =
client.prepareSearch().setSize(0).addAggregation(agg);
if (filters != null && filters.size() > 0) {
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
for (ColumnFilterConfiguration filter : filters) {
if (filter instanceof TextFilterConfiguration) {
TextFilterConfiguration textFilter =
(TextFilterConfiguration) filter;
final String textFilterValue =
StringUtils.trimToNull(textFilter.getTextFilterValue());

queryBuilder.must(QueryBuilders.termQuery(filter.getColumn().getNormalizedName(),
textFilterValue));
} else {
throw new UnsupportedOperationException("Unsupporterd
filter type.");
}
}
query.setQuery(queryBuilder);
}
SearchResponse sr = query.execute().actionGet();

In the last version, i tried not giving any query to avoid a matchAll (if
no filters defined), and otherwise give it a boolean query matching all
filters terms.
I also set the query result size to 0 to avoid fetching data for query
result, only aggregation result.
No significative improvement...

Help would be appreciated :slight_smile:

Le mercredi 25 juin 2014 13:32:56 UTC+2, Frederic Esnault a écrit :

Hi all,

I'm currently in a development phase, so i'm testing my devs on my local
machine (a mac book pro with 16Gb ram, and a 512G SSD, 4 cores).

Processor Name: Intel Core i7

Processor Speed: 2,3 GHz

Number of Processors: 1

Total Number of Cores: 4

L2 Cache (per Core): 256 KB

L3 Cache: 6 MB

Memory: 16 GB

I set up a one node / one shard / no replica ES node where i index approx
264k documents in bulk mode (takes 16s including preparation time).

Then on my application i see a table with my data, and i can create facets
on columns.

When i have more than one facet, it begins to become difficult for ES.

Checking one facet value updates the table data (filtering on the selected
facet value), and updates the other facet(s), to get facet values on
filtered data only.

This is when i select one facet value that ES begins to use 100 % cpu (i
even saw 220% in top command result), and i don't really know why.
Basically i send 2 queries, one faceting query with filtering the other is
the table data query with filters.

Do you have any idea what could cause the high cpu load on ES local node ?
(For info i gave ES node 5g of memory (Xmx and Xms)).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96f082da-184c-45c8-a19b-735fa1032051%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(phoenix) #3

Ok my bad.

The problem was not in the faceting or querying, but in the method calling
them. A findAll method which was loading the whole dataset (267k docs), by
slices of 1000. So a while executing 267 times a query limited to 1000
results, paginated to restart where the previous one left.
When i saw that, i removed the while, and executed just one ES query,
limited to 100 docs (for UI display in a table, so no need for 1000's of
rows displayed in a single screen, 100 is already too much in my opinion),
and now everything runs extremely smoothly.

Thanks for reading, and be careful of your application logic, before
blaming ES :slight_smile:

Le mercredi 25 juin 2014 19:11:39 UTC+2, Frederic Esnault a écrit :

Actually it seems it's more when i DESELECT a facet value that performance
slows down (and load increases). Because i facet and search on all the docs
(if i remove all filters).
The more facets and selections i make to restrict my table data, the more
efficient queries become.

But i'm worried about the fact that on a local dev computer, one node with
one shard and no replica, with only 265k docs, search/facetting is so
slow...
Any clue about what i could be doing wrong to get cpu load to 100 % ?

I was doing like this :

    TermsFacetBuilder facet = FacetBuilders.termsFacet(facetName)
            .field(columnName)
            .size(1000);
    SearchRequestBuilder query = client.prepareSearch(datasetName)
            .setTypes(RECORD);
    if (filters != null && filters.size() > 0) {

        BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
        for (ColumnFilterConfiguration filter : filters) {
            if (filter instanceof TextFilterConfiguration) {
                TextFilterConfiguration textFilter = 

(TextFilterConfiguration) filter;
final String textFilterValue =
StringUtils.trimToNull(textFilter.getTextFilterValue());

queryBuilder.must(QueryBuilders.termQuery(filter.getColumn().getNormalizedName(),
textFilterValue));
} else {
throw new UnsupportedOperationException("Unsupporterd
filter type.");
}
}
query.setQuery(queryBuilder);
} else {
query.setQuery(QueryBuilders.matchAllQuery());
}
query.addFacet(facet);
SearchResponse sr = query.execute().actionGet();
TermsFacet f = (TermsFacet)
sr.getFacets().facetsAsMap().get(facetName);

Now i switched to aggregations, but the result seems to be the same :

    TermsBuilder agg = 

AggregationBuilders.terms(facetName).field(columnName).size(1000);
SearchRequestBuilder query =
client.prepareSearch().setSize(0).addAggregation(agg);
if (filters != null && filters.size() > 0) {
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
for (ColumnFilterConfiguration filter : filters) {
if (filter instanceof TextFilterConfiguration) {
TextFilterConfiguration textFilter =
(TextFilterConfiguration) filter;
final String textFilterValue =
StringUtils.trimToNull(textFilter.getTextFilterValue());

queryBuilder.must(QueryBuilders.termQuery(filter.getColumn().getNormalizedName(),
textFilterValue));
} else {
throw new UnsupportedOperationException("Unsupporterd
filter type.");
}
}
query.setQuery(queryBuilder);
}
SearchResponse sr = query.execute().actionGet();

In the last version, i tried not giving any query to avoid a matchAll (if
no filters defined), and otherwise give it a boolean query matching all
filters terms.
I also set the query result size to 0 to avoid fetching data for query
result, only aggregation result.
No significative improvement...

Help would be appreciated :slight_smile:

Le mercredi 25 juin 2014 13:32:56 UTC+2, Frederic Esnault a écrit :

Hi all,

I'm currently in a development phase, so i'm testing my devs on my local
machine (a mac book pro with 16Gb ram, and a 512G SSD, 4 cores).

Processor Name: Intel Core i7

Processor Speed: 2,3 GHz

Number of Processors: 1

Total Number of Cores: 4

L2 Cache (per Core): 256 KB

L3 Cache: 6 MB

Memory: 16 GB

I set up a one node / one shard / no replica ES node where i index approx
264k documents in bulk mode (takes 16s including preparation time).

Then on my application i see a table with my data, and i can create
facets on columns.

When i have more than one facet, it begins to become difficult for ES.

Checking one facet value updates the table data (filtering on the
selected facet value), and updates the other facet(s), to get facet values
on filtered data only.

This is when i select one facet value that ES begins to use 100 % cpu (i
even saw 220% in top command result), and i don't really know why.
Basically i send 2 queries, one faceting query with filtering the other is
the table data query with filters.

Do you have any idea what could cause the high cpu load on ES local node
? (For info i gave ES node 5g of memory (Xmx and Xms)).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/de531ce4-04bb-4a71-871d-a3597dc69a34%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4