Facets Search Help


(pat-2) #1

I'm having trouble trying to build a proper facets search to accomplish
what I need. A little bit of background:

So I'm indexing docs daily, that have a created_timestamp and title_string
field (amongst others). What I'm trying to do is to build a query that
will give me all the docs within a given created_timestamp range that have
NEVER been seen before this range based on the title_string. It is
important to note that the docs must be broken down daily like this for
further analysis. I've been trying to do a terms facet with a facet filter
on the timestamp range like so:

{
"query" : {
"match_all" : {}
},
"facets" : {
"title_facet" : {
"terms" : {
"field" : "title_string",
"size" : 10,
"order" : "reverse_count",
}
"facet_filter" : {
"range" : {
"created_timestamp" : { "from" : "2012-04-01T00:00:00Z", "to" :
"2012-04-01T23:59:59Z" }
}
}
}
}
}

But this is only giving me the term count based on that given time range,
not of all time. Using a script inside the terms seemed to have the same
issue. And if I remove the facet filter, it gives me the term count of all
time, but then I don't know which terms occur in the time range I'm looking
for.

Also, is there a way to get the terms facet to return ONLY those with a
count of 1? It's not as big of a deal, but it would be nice to know.

Sorry if any of this sounds confusing; I can clarify if need be. For
reference, I'm using version 0.19.2. Any help would be greatly appreciated.

Thanks,
Pat


(Ivan Brusic) #2

Your issue would be understand if you can provide a few example
documents. A facet over a date range will return a count of all items
within that range. Why would it be any different? If the count should
be 1, which documents are being included that should not be? The
example documents would be helpful. The key piece that is missing is
the title_string.

Cheers,

Ivan

On Thu, Jul 26, 2012 at 10:25 AM, Pat brutus.buckeye@gmail.com wrote:

I'm having trouble trying to build a proper facets search to accomplish what
I need. A little bit of background:

So I'm indexing docs daily, that have a created_timestamp and title_string
field (amongst others). What I'm trying to do is to build a query that will
give me all the docs within a given created_timestamp range that have NEVER
been seen before this range based on the title_string. It is important to
note that the docs must be broken down daily like this for further analysis.
I've been trying to do a terms facet with a facet filter on the timestamp
range like so:

{
"query" : {
"match_all" : {}
},
"facets" : {
"title_facet" : {
"terms" : {
"field" : "title_string",
"size" : 10,
"order" : "reverse_count",
}
"facet_filter" : {
"range" : {
"created_timestamp" : { "from" : "2012-04-01T00:00:00Z", "to" :
"2012-04-01T23:59:59Z" }
}
}
}
}
}

But this is only giving me the term count based on that given time range,
not of all time. Using a script inside the terms seemed to have the same
issue. And if I remove the facet filter, it gives me the term count of all
time, but then I don't know which terms occur in the time range I'm looking
for.

Also, is there a way to get the terms facet to return ONLY those with a
count of 1? It's not as big of a deal, but it would be nice to know.

Sorry if any of this sounds confusing; I can clarify if need be. For
reference, I'm using version 0.19.2. Any help would be greatly appreciated.

Thanks,
Pat


(pat-2) #3

Ivan,

Here's a basic sample. Let's say I have these docs:
{ "id" : "doc1", "created_timestamp" : "2012-03-31T01:00:00Z",
"title_string" : "title1" },
{ "id" : "doc2", "created_timestamp" : "2012-03-31T01:00:00Z",
"title_string" : "title2" },
{ "id" : "doc3", "created_timestamp" : "2012-03-31T01:00:00Z",
"title_string" : "title3" },
{ "id" : "doc4", "created_timestamp" : "2012-04-01T02:00:00Z",
"title_string" : "title1" },
{ "id" : "doc5", "created_timestamp" : "2012-04-01T02:00:00Z",
"title_string" : "title4" }

If I do my exact facet I provided earlier, it would return the terms:
{ "term" : "title1", "count" : 1 },
{ "term" : "title4", "count" : 1 }

This is NOT what I want because "title1" has been seen before on 03/31. I
only want "title4" returned since it was first seen in that date range.
Maybe a faceted search isn't the way to go, but I couldn't think of any
other way without changing my schema.

-Pat


(Ivan Brusic) #4

I do not think it is possible to achieve the functionality you are
looking for. You situation is analogous to having to join two
identical tables in SQL. You can create a filter that combines the
range filter with an exists filter, but you cannot dynamically create
the exists filter for each document.

Perhaps this is doable with scripts. Hopefully someone more familiar
with deeper levels of scripting will chime in.

Cheers,

Ivan

On Thu, Jul 26, 2012 at 11:35 AM, Pat brutus.buckeye@gmail.com wrote:

Ivan,

Here's a basic sample. Let's say I have these docs:
{ "id" : "doc1", "created_timestamp" : "2012-03-31T01:00:00Z",
"title_string" : "title1" },
{ "id" : "doc2", "created_timestamp" : "2012-03-31T01:00:00Z",
"title_string" : "title2" },
{ "id" : "doc3", "created_timestamp" : "2012-03-31T01:00:00Z",
"title_string" : "title3" },
{ "id" : "doc4", "created_timestamp" : "2012-04-01T02:00:00Z",
"title_string" : "title1" },
{ "id" : "doc5", "created_timestamp" : "2012-04-01T02:00:00Z",
"title_string" : "title4" }

If I do my exact facet I provided earlier, it would return the terms:
{ "term" : "title1", "count" : 1 },
{ "term" : "title4", "count" : 1 }

This is NOT what I want because "title1" has been seen before on 03/31. I
only want "title4" returned since it was first seen in that date range.
Maybe a faceted search isn't the way to go, but I couldn't think of any
other way without changing my schema.

-Pat


(system) #5