Which facet need more memory? term facet or range facet?

Hi

I want to implement hierarchical facets for our categories data.
Categories was stored with nested set model, so every category has a
lft and rgt field. I can index these two fields into
elasticsearch, and use range facet to get count.

Recently I found another way to do the same functionality. Use Path
Hierarchy Tokenizer which elasticsearch exposed to index category path
which like /20/134/7856, and use term facet with regex patterns to get
count.

I'm new to elasticsearch. I have known facets need lots of memory,
term facet will load relevant field values into memory, but still
don't know which one is better.

Any suggestion?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

I'm didn't really understand how your data looks like, but I'd say range
facets should be faster, because they won't need the regex patterns.

I'd also assume they use less memory, but you can confirm this by testing
your two implementations, while monitoring your ES cluster.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Tue, Jan 29, 2013 at 6:03 PM, kenshin himurakenshin54@gmail.com wrote:

Hi

I want to implement hierarchical facets for our categories data.
Categories was stored with nested set model, so every category has a
lft and rgt field. I can index these two fields into
elasticsearch, and use range facet to get count.

Recently I found another way to do the same functionality. Use Path
Hierarchy Tokenizer which elasticsearch exposed to index category path
which like /20/134/7856, and use term facet with regex patterns to get
count.

I'm new to elasticsearch. I have known facets need lots of memory,
term facet will load relevant field values into memory, but still
don't know which one is better.

Any suggestion?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Tue, 2013-01-29 at 08:03 -0800, kenshin wrote:

Hi

I want to implement hierarchical facets for our categories data.
Categories was stored with nested set model, so every category has a
lft and rgt field. I can index these two fields into
elasticsearch, and use range facet to get count.

Recently I found another way to do the same functionality. Use Path
Hierarchy Tokenizer which elasticsearch exposed to index category path
which like /20/134/7856, and use term facet with regex patterns to get
count.

I'm new to elasticsearch. I have known facets need lots of memory,
term facet will load relevant field values into memory, but still
don't know which one is better.

I'm unclear as to exactly how you're using the terms/range facets, but
facets need to load all the field values for every doc into memory.

The memory usage depends on:

  1. the number of unique values
  2. the number_of_docs * max_values_per_doc

The path hierarchy tokenizer results in multiple values per field,
eg /foo/bar/baz gives you:

  • /foo
  • /foo/bar
  • /foo/bar/baz
    ie 3 values

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.