How to facet on size?


(Dan Tuffery) #1

I have int field 'size' that stores a size value in bytes. A requirement
has come to be able to facet on the field using an log-linear scale, e.g.
Up to 1MB, Up to 10MB, 100MB, 1GB, Over 1GB

What is the best way to achieve this kind of faceting in ElasticSearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(depahelix) #2

One way to do it would be to do just a tiny bit of preprocessing and build
your own buckets.

Then index those as not_analyzed.

-Chris.

From: es newbie [via ElasticSearch Users]
[mailto:ml-node+s115913n4042001h36@n3.nabble.com]
Sent: Wednesday, October 02, 2013 6:24 AM
To: depahelix
Subject: How to facet on size?

I have int field 'size' that stores a size value in bytes. A requirement has
come to be able to facet on the field using an log-linear scale, e.g. Up to
1MB, Up to 10MB, 100MB, 1GB, Over 1GB

What is the best way to achieve this kind of faceting in ElasticSearch?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email].
For more options, visit https://groups.google.com/groups/opt_out.


If you reply to this email, your message will be added to the discussion
below:

http://elasticsearch-users.115913.n3.nabble.com/How-to-facet-on-size-tp40420
01.html

To start a new topic under ElasticSearch Users, email
ml-node+s115913n115913h8@n3.nabble.com
To unsubscribe from ElasticSearch Users, click here
<http://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?ma
cro=unsubscribe_by_code&node=115913&code=Y2hyaXNAZGVwYWhlbGl4LmNvbXwxMTU5MTN
8LTE0MjA5MDM0ODI=> .

<http://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?ma
cro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.name
spaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.w
eb.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.na
ml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.
naml> NAML


(Chris Morley) #3

One way to do it would be to do just a tiny bit of pre-processing on ingest
and build your own buckets. Then, use not_analyzed. Just one idea. There
may be a better way that I don't know about.

On Wednesday, October 2, 2013 6:23:43 AM UTC-4, dan wrote:

I have int field 'size' that stores a size value in bytes. A requirement
has come to be able to facet on the field using an log-linear scale, e.g.
Up to 1MB, Up to 10MB, 100MB, 1GB, Over 1GB

What is the best way to achieve this kind of faceting in ElasticSearch?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(James Richardson) #4

I use slightly different scale... but the code looks like this:

builderFor(query) would create the initial term query or whatever.

public class NoticeValueFacetsJsonScheme extends AbstractNoticeScheme {

private final List<Range> ranges = constructRanges();

public JsonString facetQuery(NoticeQuery query) {
    return JsonString.of(addRangeFacetsTo(builderFor(query)).toString());
}

private SearchRequestBuilder addRangeFacetsTo(SearchRequestBuilder searchRequestBuilder) {
    RangeFacetBuilder rangeFacet = FacetBuilders.rangeFacet("ranges").field("your-field-goes-here");

    for (Range range : ranges) {
        if ( range.isRange() ) {
            rangeFacet.addRange(range.getLower(), range.getUpper());
        }
        else if ( range.isLowerBound() ) {
            rangeFacet.addUnboundedTo(range.getLower());
        }
        else if ( range.isUpperBound() ) {
            rangeFacet.addUnboundedFrom(range.getUpper());
        }
    }

    searchRequestBuilder.addFacet(rangeFacet);

    return searchRequestBuilder;
}

private List<Range> constructRanges() {

    int count = 15;

    double lower = thousands(100);
    double upper = billions(1);

    double lowerf = Math.log10(lower);
    double upperf = Math.log10(upper);

    double diff = upperf - lowerf;

    double step = diff / (double) count;

    List<Range> ranges = newArrayList();

    ranges.add(Range.between(0d,1d));
    ranges.add(Range.between(1d, lower));

    for ( int i = 1 ; i <= count ; i++ ) {
        ranges.add(
                Range.between(
                        Math.pow(10.0,lowerf + ( step * ( i - 1))),
                        Math.pow(10.0, lowerf + ( step * i ))
        ));
    }

    ranges.add(Range.from(upper));

    return ranges;
}

private long thousands(int i) { return i * 1000; }
private long millions(int i) { return i * thousands(1000); }
private long billions(int i) { return i * millions(1000); }

public static class Range {

    private final Double lower;
    private final Double upper;

    private Range(Double lower, Double upper) {
        this.lower = lower;
        this.upper = upper;
    }

    public boolean isLowerBound() {
        return lower != null;
    }

    public boolean isUpperBound() {
        return upper != null;
    }

    public boolean isRange() {
        return isLowerBound() && isUpperBound();
    }
    
    public double getLower() {
        return lower;
    }

    public double getUpper() {
        return upper;
    }

    public static Range to(Double number) {
        return new Range(null, number);
    }

    public static Range from(Double number) {
        return new Range(number, null);
    }

    public static Range between(Double lower, Double upper) {
        return new Range(lower, upper);
    }
}

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(James Richardson) #5

I specifically wouldn't do that.

By doing this you are saying that if your ranges change you will need to
re-import all of your data, same if you come up with a similar but
not-the-same requirement - this is going to suck.

Let the stuff in the index be the base data, then find meaning by searching
and selecting and aggregating as you need to.

James

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Dan Tuffery) #6

Thanks James, that's very helpful.

On Wednesday, October 2, 2013 1:46:58 PM UTC+1, James Richardson wrote:

I specifically wouldn't do that.

By doing this you are saying that if your ranges change you will need to
re-import all of your data, same if you come up with a similar but
not-the-same requirement - this is going to suck.

Let the stuff in the index be the base data, then find meaning by
searching and selecting and aggregating as you need to.

James

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Dan Tuffery) #7

Thanks James, that's very helpful.

On Wednesday, October 2, 2013 1:27:11 PM UTC+1, James Richardson wrote:

I use slightly different scale... but the code looks like this:

builderFor(query) would create the initial term query or whatever.

public class NoticeValueFacetsJsonScheme extends AbstractNoticeScheme {

private final List<Range> ranges = constructRanges();

public JsonString facetQuery(NoticeQuery query) {
    return JsonString.of(addRangeFacetsTo(builderFor(query)).toString());
}

private SearchRequestBuilder addRangeFacetsTo(SearchRequestBuilder searchRequestBuilder) {
    RangeFacetBuilder rangeFacet = FacetBuilders.rangeFacet("ranges").field("your-field-goes-here");

    for (Range range : ranges) {
        if ( range.isRange() ) {
            rangeFacet.addRange(range.getLower(), range.getUpper());
        }
        else if ( range.isLowerBound() ) {
            rangeFacet.addUnboundedTo(range.getLower());
        }
        else if ( range.isUpperBound() ) {
            rangeFacet.addUnboundedFrom(range.getUpper());
        }
    }

    searchRequestBuilder.addFacet(rangeFacet);

    return searchRequestBuilder;
}

private List<Range> constructRanges() {

    int count = 15;

    double lower = thousands(100);
    double upper = billions(1);

    double lowerf = Math.log10(lower);
    double upperf = Math.log10(upper);

    double diff = upperf - lowerf;

    double step = diff / (double) count;

    List<Range> ranges = newArrayList();

    ranges.add(Range.between(0d,1d));
    ranges.add(Range.between(1d, lower));

    for ( int i = 1 ; i <= count ; i++ ) {
        ranges.add(
                Range.between(
                        Math.pow(10.0,lowerf + ( step * ( i - 1))),
                        Math.pow(10.0, lowerf + ( step * i ))
        ));
    }

    ranges.add(Range.from(upper));

    return ranges;
}

private long thousands(int i) { return i * 1000; }
private long millions(int i) { return i * thousands(1000); }
private long billions(int i) { return i * millions(1000); }

public static class Range {

    private final Double lower;
    private final Double upper;

    private Range(Double lower, Double upper) {
        this.lower = lower;
        this.upper = upper;
    }

    public boolean isLowerBound() {
        return lower != null;
    }

    public boolean isUpperBound() {
        return upper != null;
    }

    public boolean isRange() {
        return isLowerBound() && isUpperBound();
    }
    
    public double getLower() {
        return lower;
    }

    public double getUpper() {
        return upper;
    }

    public static Range to(Double number) {
        return new Range(null, number);
    }

    public static Range from(Double number) {
        return new Range(number, null);
    }

    public static Range between(Double lower, Double upper) {
        return new Range(lower, upper);
    }
}

}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8