Multivalued field


(Rogerio Pereira) #1

Hi,

How elasticsearch handles a multivalued field? Like authors field? The
content is initially a comma separated list from which I would like to
split the values and then use in a facet?

--


(Rogerio Pereira) #2

It's an array type:

http://www.elasticsearch.org/guide/reference/mapping/array-type.html

Em segunda-feira, 5 de novembro de 2012 20h23min14s UTC-2, Rogerio Pereira
escreveu:

Hi,

How elasticsearch handles a multivalued field? Like authors field? The
content is initially a comma separated list from which I would like to
split the values and then use in a facet?

--


(David Pilato) #3

I think I did not understand your concern.

An answer is perhaps to use the _analyze API to see how ES will break your field into tokens.
http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html

Does it help?
Or could you elaborate a bit more?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 nov. 2012 à 23:41, Rogerio Pereira rogerio.araujo@gmail.com a écrit :

It's an array type:

http://www.elasticsearch.org/guide/reference/mapping/array-type.html

Em segunda-feira, 5 de novembro de 2012 20h23min14s UTC-2, Rogerio Pereira escreveu:

Hi,

How elasticsearch handles a multivalued field? Like authors field? The content is initially a comma separated list from which I would like to split the values and then use in a facet?

--

--


(Kenneth Loafman-2) #4

I think I know what he wants. If you can get the frequency (not just
presence) of the words within the document and you can get the total
frequency of words within the corpus, then you can use that to get the
weight of each word and use that to do a statistical comparison of two
documents to see if they match. Its very fast and can be used for things
like near-deduplication. You can even do things like set up a medical
corpus and a legal corpus and figure out reliably where a new document
belongs.

...Ken

On Mon, Nov 5, 2012 at 9:38 PM, David Pilato david@pilato.fr wrote:

I think I did not understand your concern.

An answer is perhaps to use the _analyze API to see how ES will break your
field into tokens.
http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html

Does it help?
Or could you elaborate a bit more?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 nov. 2012 à 23:41, Rogerio Pereira rogerio.araujo@gmail.com a
écrit :

It's an array type:

http://www.elasticsearch.org/guide/reference/mapping/array-type.html

Em segunda-feira, 5 de novembro de 2012 20h23min14s UTC-2, Rogerio Pereira
escreveu:

Hi,

How elasticsearch handles a multivalued field? Like authors field? The
content is initially a comma separated list from which I would like to
split the values and then use in a facet?

--

--

--


(Rogerio Pereira) #5

I think you give me a direction David, I just need to split something like
"Author 1, Author 2" into several terms to use in my facet.

As far I could see the pattern tokenizer can help me to do that instead of
set an array to my author field.

Em terça-feira, 6 de novembro de 2012 01h39min11s UTC-2, David Pilato
escreveu:

I think I did not understand your concern.

An answer is perhaps to use the _analyze API to see how ES will break your
field into tokens.
http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html

Does it help?
Or could you elaborate a bit more?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 5 nov. 2012 à 23:41, Rogerio Pereira <rogerio...@gmail.com<javascript:>>
a écrit :

It's an array type:

http://www.elasticsearch.org/guide/reference/mapping/array-type.html

Em segunda-feira, 5 de novembro de 2012 20h23min14s UTC-2, Rogerio Pereira
escreveu:

Hi,

How elasticsearch handles a multivalued field? Like authors field? The
content is initially a comma separated list from which I would like to
split the values and then use in a facet?

--

--


(system) #6