Now that I am into the real wold scenario, it gets a bit tricker - I have
nested objects (keys).
I have to test the existence of the key in the Groovy script to avoid
parsing errors on insert.
How do you access a nested object in groovy? and test for the existence of
a nested object key?
such as this example:
curl -XPOST 'http://'$NODE':9200/'$INDEX_NAME'/post' -d '{
"titles": ["title 1", "title 2", "title 3", "title 4"],
"raw" : {
"links" : ["Yahoo | Mail, Weather, Search, Politics, News, Finance, Sports & Videos", "Yahoo | Mail, Weather, Search, Politics, News, Finance, Sports & Videos",
"Warning! | There might be a problem with the requested link", "http://bit.ly/ghi"]
}
}'
This doesn't seem to work (form what I can tell it never finds the key
raw.links even when it does exist)
"script" : "if (ctx._source.containsKey('raw.links') )
{ctx._source.links_url_count = ctx._source['raw.links''].size() } else {
ctx._source.links_url_count = 0 }"
Simple keys work though like ctx._source.containsKey('title')
On Thursday, January 8, 2015 at 9:59:56 PM UTC-8, Nikolas Everett wrote:
Transform never saves to source. You have to transform on the application
side for that. It was designed for times when you wanted to index something
like this that would just take up extra space in the source document. I
imagine you could use a script field on the query if you need the result to
contain the count. Or just count it on the result side.
Nik
On Jan 9, 2015 12:43 AM, "Jeff Steinmetz" <jeffrey....@gmail.com
<javascript:>> wrote:
Transform worked well. Nice.
Curious how to get it to save to source? Tried this below, no go. (I
can however do range queries agains title_count, so transform was indexed
and works well)
"transform" : {
"script" : "ctx._source['\'title_count\''] =
ctx._source[''titles''].size()",
"lang": "groovy"
},
"properties": {
"titles": { "type": "string", "index": "not_analyzed" },
"title_count" : { "type": "integer", "store": "yes" }
}
}'
On Thursday, January 8, 2015 at 9:15:28 PM UTC-8, Nikolas Everett wrote:
Source is going to be pretty sloe, yeah. If its a one off then its
probably fine but if you do it a lot probably best to index the count.
On Jan 9, 2015 12:04 AM, "Jeff Steinmetz" jeffrey....@gmail.com wrote:
Thank you, that worked.
I was curious about the speed, is running a script using _source slower
that doc ?
Totally understand a dynamic script is slower regardless of _source vs
doc.
Makes sense that having a count transformed up front during index to
create a materialized value would certainly be much faster.
On Thursday, January 8, 2015 at 7:04:40 PM UTC-8, Nikolas Everett wrote:
On Thu, Jan 8, 2015 at 9:09 PM, Jeff Steinmetz jeffrey....@gmail.com
wrote:
Is there a better way to do this?
Please see this gist (or even better yet, run the script locally see
the issue).
Determine list [array] size in elasticsearch issue · GitHub
You must have scripting enabled in your elasticsearch config for this
to work.
This was originally based on some comments I found here:
elasticsearch - Search by size of object type field elastic search - Stack Overflow
of-object-type-field-elastic-search
We would like to use a filtered query to only include documents that
a small count of items in the list [aka array], filtering where
values.size() < 10
"script": "doc['titles'].values.size() < 10"
Turns out the values.size() actually either counts tokenized
(analyzed) words, or if the mapping turns off analysis, it still counts
incorrectly if there are duplicates.
If analyze is not turned off, it counts tokenized words, not the
number of elements in the list.
If analyze is turned off for a given field, it improves, but
duplicates are missed.
For example, This comes back as size == 2
"titles": ["one", "duplicate", "duplicate"]
This comes back as size == 3, should be 4
"titles": ["Yahoo | Mail, Weather, Search, Politics, News, Finance, Sports & Videos", "Yahoo | Mail, Weather, Search, Politics, News, Finance, Sports & Videos", "
Warning! | There might be a problem with the requested link", "http://bit.ly/ghi"]
Is this a bug, is there a better way, or is this just something that
we don't understand about groovy and values.size()?
I think that's just the way doc works. Try (but don't actually
deploy) _source['titles'].size() < 10. That should do what you expect.
Don't deploy that because its too slow. Try indexing the size and
filtering on it. You can use a transform to add the size of the array as
an integer field and just filter on it using a range filter. That'd
probably be the fastest option.
Nik
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/75736948-beac-43fc-84d4-25a94456d4ca%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/00ff2bc1-94a9-4aa9-8c7e-ef5734affb4d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3717aecd-78c1-4e48-9771-acc49f8c730a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.