Analyzers and faceting and unknown data

I am developing a system that as well as a core, predefined schema being
indexed by elasticsearch, also contains arbitrary user-supplied JSON. I am
using the snowball analyser for all data.

I am wishing to provide the ability for users to perform a faceted search
over any of their arbitrary fields. However because the fields are being
indexed using snowball, the terms being retrieved for faceting are
incorrect (singular rather than plural in some cases - which is fine - but
completely bogus prefixes in others). I was hoping that adding a "_all" : {
"store": "yes" } would do the trick, but no such luck.

Is there an easy solution? Or do I need to index each field twice, one wit
the snowball analyser (for searching) and one with no analysers (for
faceting)?

M

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I usualy use multifield feature for that.

field.field is analyzed
field.facet is not analyzed

IMHO, using _all in facets is risky. You could hit some Out Of Memory Exception issues.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 05:57, Matt Painter matt@deity.co.nz a écrit :

I am developing a system that as well as a core, predefined schema being indexed by elasticsearch, also contains arbitrary user-supplied JSON. I am using the snowball analyser for all data.

I am wishing to provide the ability for users to perform a faceted search over any of their arbitrary fields. However because the fields are being indexed using snowball, the terms being retrieved for faceting are incorrect (singular rather than plural in some cases - which is fine - but completely bogus prefixes in others). I was hoping that adding a "_all" : { "store": "yes" } would do the trick, but no such luck.

Is there an easy solution? Or do I need to index each field twice, one wit the snowball analyser (for searching) and one with no analysers (for faceting)?

M

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for that David - didn't realise that existed. Looks like some fun
JSON munging for me!

M

PS. Thanks for the word of warning re: _all - noted.

On Sunday, February 3, 2013 8:13:45 PM UTC+13, David Pilato wrote:

I usualy use multifield feature for that.

field.field is analyzed
field.facet is not analyzed

IMHO, using _all in facets is risky. You could hit some Out Of Memory
Exception issues.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 05:57, Matt Painter <ma...@deity.co.nz <javascript:>> a
écrit :

I am developing a system that as well as a core, predefined schema being
indexed by elasticsearch, also contains arbitrary user-supplied JSON. I am
using the snowball analyser for all data.

I am wishing to provide the ability for users to perform a faceted search
over any of their arbitrary fields. However because the fields are being
indexed using snowball, the terms being retrieved for faceting are
incorrect (singular rather than plural in some cases - which is fine - but
completely bogus prefixes in others). I was hoping that adding a "_all" :
{ "store": "yes" } would do the trick, but no such luck.

Is there an easy solution? Or do I need to index each field twice, one wit
the snowball analyser (for searching) and one with no analysers (for
faceting)?

M

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Actually, how would that technique work for faceting data which is almost
entirely dynamically-mapped? I only know the fields and field structure at
the point of the request for indexing being retrieved and, as far as I
know, one can't index both a document and include specific mapping at the
same time (maybe? I'm pretty new to Elastic).

FWIW, my data structure is this:

class Record {

// User-supplied JSON metadata
List metadata;

...

other known fields;
}

where the top-level Record is indexed into Elasticsearch, has a bunch of
known fields and a list of arbitrary raw JSON metadata - which is what we
need faceting on.

I realise that not having a known schema is limiting what we can get out of
Elasticsearch, but I'm trying to make the barrier for users to index and
search on arbitrary content as low as possible.

M

On 3 February 2013 21:00, Matt Painter matt@deity.co.nz wrote:

Thanks for that David - didn't realise that existed. Looks like some fun
JSON munging for me!

M

PS. Thanks for the word of warning re: _all - noted.

On Sunday, February 3, 2013 8:13:45 PM UTC+13, David Pilato wrote:

I usualy use multifield feature for that.

field.field is analyzed
field.facet is not analyzed

IMHO, using _all in facets is risky. You could hit some Out Of Memory
Exception issues.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 05:57, Matt Painter ma...@deity.co.nz a écrit :

I am developing a system that as well as a core, predefined schema being
indexed by elasticsearch, also contains arbitrary user-supplied JSON. I am
using the snowball analyser for all data.

I am wishing to provide the ability for users to perform a faceted search
over any of their arbitrary fields. However because the fields are being
indexed using snowball, the terms being retrieved for faceting are
incorrect (singular rather than plural in some cases - which is fine - but
completely bogus prefixes in others). I was hoping that adding a "_all"
: { "store": "yes" } would do the trick, but no such luck.

Is there an easy solution? Or do I need to index each field twice, one
wit the snowball analyser (for searching) and one with no analysers (for
faceting)?

M

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Aha, have just discovered the section in the doco about
dynamic_templateswith an explicit example featuring
multi_field. Looks like it may be what I'm after?

On 3 February 2013 21:19, Matt Painter matt@deity.co.nz wrote:

Actually, how would that technique work for faceting data which is almost
entirely dynamically-mapped? I only know the fields and field structure at
the point of the request for indexing being retrieved and, as far as I
know, one can't index both a document and include specific mapping at the
same time (maybe? I'm pretty new to Elastic).

FWIW, my data structure is this:

class Record {

// User-supplied JSON metadata
List metadata;

...

other known fields;
}

where the top-level Record is indexed into Elasticsearch, has a bunch of
known fields and a list of arbitrary raw JSON metadata - which is what we
need faceting on.

I realise that not having a known schema is limiting what we can get out
of Elasticsearch, but I'm trying to make the barrier for users to index and
search on arbitrary content as low as possible.

M

On 3 February 2013 21:00, Matt Painter matt@deity.co.nz wrote:

Thanks for that David - didn't realise that existed. Looks like some fun
JSON munging for me!

M

PS. Thanks for the word of warning re: _all - noted.

On Sunday, February 3, 2013 8:13:45 PM UTC+13, David Pilato wrote:

I usualy use multifield feature for that.

field.field is analyzed
field.facet is not analyzed

IMHO, using _all in facets is risky. You could hit some Out Of Memory
Exception issues.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 05:57, Matt Painter ma...@deity.co.nz a écrit :

I am developing a system that as well as a core, predefined schema being
indexed by elasticsearch, also contains arbitrary user-supplied JSON. I am
using the snowball analyser for all data.

I am wishing to provide the ability for users to perform a faceted
search over any of their arbitrary fields. However because the fields are
being indexed using snowball, the terms being retrieved for faceting are
incorrect (singular rather than plural in some cases - which is fine - but
completely bogus prefixes in others). I was hoping that adding a "_all"
: { "store": "yes" } would do the trick, but no such luck.

Is there an easy solution? Or do I need to index each field twice, one
wit the snowball analyser (for searching) and one with no analysers (for
faceting)?

M

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If your documents look like
{
"metadata":["value1", "value2"...]
}

You only have to define a mapping for your field metadata using multifield.

If users will add new fields, like:

{
"metadata1":"value1",
"metadata2":"value2"...
}

I think that dynamic_templates will help.

How does your JSON doc look like at the end?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 09:27, Matt Painter matt@deity.co.nz a écrit :

Aha, have just discovered the section in the doco about dynamic_templates with an explicit example featuring multi_field. Looks like it may be what I'm after?

On 3 February 2013 21:19, Matt Painter matt@deity.co.nz wrote:

Actually, how would that technique work for faceting data which is almost entirely dynamically-mapped? I only know the fields and field structure at the point of the request for indexing being retrieved and, as far as I know, one can't index both a document and include specific mapping at the same time (maybe? I'm pretty new to Elastic).

FWIW, my data structure is this:

class Record {

// User-supplied JSON metadata
List metadata;

...

other known fields;
}

where the top-level Record is indexed into Elasticsearch, has a bunch of known fields and a list of arbitrary raw JSON metadata - which is what we need faceting on.

I realise that not having a known schema is limiting what we can get out of Elasticsearch, but I'm trying to make the barrier for users to index and search on arbitrary content as low as possible.

M

On 3 February 2013 21:00, Matt Painter matt@deity.co.nz wrote:

Thanks for that David - didn't realise that existed. Looks like some fun JSON munging for me!

M

PS. Thanks for the word of warning re: _all - noted.

On Sunday, February 3, 2013 8:13:45 PM UTC+13, David Pilato wrote:

I usualy use multifield feature for that.

field.field is analyzed
field.facet is not analyzed

IMHO, using _all in facets is risky. You could hit some Out Of Memory Exception issues.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 05:57, Matt Painter ma...@deity.co.nz a écrit :

I am developing a system that as well as a core, predefined schema being indexed by elasticsearch, also contains arbitrary user-supplied JSON. I am using the snowball analyser for all data.

I am wishing to provide the ability for users to perform a faceted search over any of their arbitrary fields. However because the fields are being indexed using snowball, the terms being retrieved for faceting are incorrect (singular rather than plural in some cases - which is fine - but completely bogus prefixes in others). I was hoping that adding a "_all" : { "store": "yes" } would do the trick, but no such luck.

Is there an easy solution? Or do I need to index each field twice, one wit the snowball analyser (for searching) and one with no analysers (for faceting)?

M

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks David. Each String of metadata is actually a complete JSON document
(multiple fields, arrays, collections, etc.), so applying multi_field
mapping against the metadata field isn't going to work. I went with
dynamic_templates and applied multi_field to all string fields and all
works well thus far - for my test data at least!

Cheers,
M

On 3 February 2013 23:18, David Pilato david@pilato.fr wrote:

If your documents look like
{
"metadata":["value1", "value2"...]
}

You only have to define a mapping for your field metadata using multifield.

If users will add new fields, like:

{
"metadata1":"value1",
"metadata2":"value2"...
}

I think that dynamic_templates will help.

How does your JSON doc look like at the end?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 09:27, Matt Painter matt@deity.co.nz a écrit :

Aha, have just discovered the section in the doco about dynamic_templateswith an explicit example featuring
multi_field. Looks like it may be what I'm after?

On 3 February 2013 21:19, Matt Painter matt@deity.co.nz wrote:

Actually, how would that technique work for faceting data which is almost
entirely dynamically-mapped? I only know the fields and field structure at
the point of the request for indexing being retrieved and, as far as I
know, one can't index both a document and include specific mapping at the
same time (maybe? I'm pretty new to Elastic).

FWIW, my data structure is this:

class Record {

// User-supplied JSON metadata
List metadata;

...

other known fields;
}

where the top-level Record is indexed into Elasticsearch, has a bunch of
known fields and a list of arbitrary raw JSON metadata - which is what we
need faceting on.

I realise that not having a known schema is limiting what we can get out
of Elasticsearch, but I'm trying to make the barrier for users to index and
search on arbitrary content as low as possible.

M

On 3 February 2013 21:00, Matt Painter matt@deity.co.nz wrote:

Thanks for that David - didn't realise that existed. Looks like some fun
JSON munging for me!

M

PS. Thanks for the word of warning re: _all - noted.

On Sunday, February 3, 2013 8:13:45 PM UTC+13, David Pilato wrote:

I usualy use multifield feature for that.

field.field is analyzed
field.facet is not analyzed

IMHO, using _all in facets is risky. You could hit some Out Of Memory
Exception issues.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2013 à 05:57, Matt Painter ma...@deity.co.nz a écrit :

I am developing a system that as well as a core, predefined schema
being indexed by elasticsearch, also contains arbitrary user-supplied JSON.
I am using the snowball analyser for all data.

I am wishing to provide the ability for users to perform a faceted
search over any of their arbitrary fields. However because the fields are
being indexed using snowball, the terms being retrieved for faceting are
incorrect (singular rather than plural in some cases - which is fine - but
completely bogus prefixes in others). I was hoping that adding a"_all" : { "store": "yes" } would
do the trick, but no such luck.

Is there an easy solution? Or do I need to index each field twice, one
wit the snowball analyser (for searching) and one with no analysers (for
faceting)?

M

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Matt Painter
matt@deity.co.nz
+64 21 115 9378

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.