Short Questions

Hi,

First of all, I am new to ElasticSearch so if I do "noobish" mistakes or I
have omitted something in the documentation please bare with me :).

I have some short questions

  1. is it possible to define into the mapping a "_customField" (underscored
    field like _all) as a list of other fields that you might have in the
    indexed doc?

ex:
{
"_en_fields" : ["en_title", "en_message", "en_tags"]
}
*
2) From the site doc: "The include_in_all setting on the “default” field
allows to control if the value of the field should be included in the _allfield. Note, the value of the field is copied to
_all, not the tokens. So, it only makes sense to copy the field value once.
Because of this, the include_in_all setting on all non-default fields is
automatically set to false and can’t be changed."
*
*"the value of the field is copied to _all, not the tokens" - *what do you
mean by this? Only the "non-analyzed" values of the default fields will be
aggregated into "_all" field?

  1. What is the most convenient way to implement multi language support but
    also assure that when a mistake is made by the user (i.e. mistake in
    language selection for an input text) the search also retrieves something
    relevant? Should the text be also indexed and stored with the default
    (general) properties? (that would mean -> 2 x time/space for that index)

  2. If a field is only indexed (not stored), could it be retrieved from the
    _index field?

--

--> last question 4) refers more to the fact that if I could have an
indexed field "text" which I won't retrieve it from my JSON doc (I would
like not keep it as it is in the JSON doc) but only using it for search (in
index) use.

--

Hi Cristi. And welcome! :slight_smile:

Unfortunately, I didn't follow the last question you send in a separate
mail. So maybe you can rephrase.

On Fri, Jan 18, 2013 at 9:48 PM, Cristian Mihai Barca <
cristi.barca@gmail.com> wrote:

Hi,

First of all, I am new to Elasticsearch so if I do "noobish" mistakes or I
have omitted something in the documentation please bare with me :).

I have some short questions

  1. is it possible to define into the mapping a "_customField" (underscored
    field like _all) as a list of other fields that you might have in the
    indexed doc?

ex:
{
"_en_fields" : ["en_title", "en_message", "en_tags"]
}

Not as far as I know, but there might be some good workarounds for your
specific usecase.

2) From the site doc: "The include_in_all setting on the “default”
field allows to control if the value of the field should be included in the
_all field. Note, the value of the field is copied to _all, not the
tokens. So, it only makes sense to copy the field value once. Because of
this, the include_in_all setting on all non-default fields is
automatically set to false and can’t be changed."
*
*"the value of the field is copied to _all, not the tokens" - *what do
you mean by this? Only the "non-analyzed" values of the default fields will
be aggregated into "_all" field?

It means the value from all the included fields will make up the value of
"_all", which is analyzed separately (it can have it's own "analyzer"
setting).

  1. What is the most convenient way to implement multi language support but
    also assure that when a mistake is made by the user (i.e. mistake in
    language selection for an input text) the search also retrieves something
    relevant? Should the text be also indexed and stored with the default
    (general) properties? (that would mean -> 2 x time/space for that index)

Yes, maybe you can make use of the Multi Field type:

  1. If a field is only indexed (not stored), could it be retrieved from the
    _index field?

No, the _index field stores the Elasticsearch index where the document is
stored. For example:
$ curl -XPUT localhost:9200/test_index/test_type/1 -d '{"foo":"bar"}'
{"ok":true,"_index":"test_index","_type":"test_type","_id":"1","_version":1}

You can see there that _index=test_index. That's all you can get from the
"_index" field.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

Hi,

First, thanks for the answer and the welcome message! :slight_smile:

Second, I had some thought on the 3rd question. Is it a good alternative,
maybe for flexibility, to not use multi_field for a multi-language field
but to use separate objects with a prefix of that language? That object
would include the same fields but from the language it specifies analyzed
with a custom analyzer. Would it be a problem if, for instance, an item
won't have "text_spanish" but in the mapping is specified?

Example of mapping:
{
"item" : {
"properties" : {
"text_default": {
"title" : { ...., "analyzer" : "default" }
"content" : { ..., "analyzer" : "default"}
}

"text_en" : {
  "title" : { ..., "analyzer" : "en_custom_analyzer" }
  "content" : {  ..., "analyzer" : "en_custom_analyzer" }
}

}
"analysis" : {
...
}
}

On Friday, January 18, 2013 8:48:22 PM UTC+1, Cristian Mihai Barca wrote:

Hi,

First of all, I am new to Elasticsearch so if I do "noobish" mistakes or I
have omitted something in the documentation please bare with me :).

I have some short questions

  1. is it possible to define into the mapping a "_customField" (underscored
    field like _all) as a list of other fields that you might have in the
    indexed doc?

ex:
{
"_en_fields" : ["en_title", "en_message", "en_tags"]
}
*
2) From the site doc: "The include_in_all setting on the “default”
field allows to control if the value of the field should be included in the
_all field. Note, the value of the field is copied to _all, not the
tokens. So, it only makes sense to copy the field value once. Because of
this, the include_in_all setting on all non-default fields is
automatically set to false and can’t be changed."
*
*"the value of the field is copied to _all, not the tokens" - *what do
you mean by this? Only the "non-analyzed" values of the default fields will
be aggregated into "_all" field?

  1. What is the most convenient way to implement multi language support but
    also assure that when a mistake is made by the user (i.e. mistake in
    language selection for an input text) the search also retrieves something
    relevant? Should the text be also indexed and stored with the default
    (general) properties? (that would mean -> 2 x time/space for that index)

  2. If a field is only indexed (not stored), could it be retrieved from the
    _index field?

--

Hi,

I wanted to say sufix not prefix :slight_smile:

What I meant by being more flexible (in my opinion) is that you can "hit"
the query on a specific language (contained in map) and on the default as
"whole" right away. I'm thinking about something like this (I have not
implemented yet, so I don't know how practical is):

{
"multi_match" : {
"query" : "qu'es que tu fais",
"fields" : [ "text_default.", "text_fr." ]
}
}

Could you do something like that?

Or boosting the language that you detect the user is searching on:

{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "text_default.", "text_en.^2" ]
}
}

{
"multi_match" : {
"query" : "qu'es que tu fais means what do you do in french",
"fields" : [ "text_default.", "text_fr.^2", "text_en.*^2" ]
}
}

On Thursday, January 24, 2013 2:53:56 PM UTC+1, Cristian Mihai Barca wrote:

Hi,

First, thanks for the answer and the welcome message! :slight_smile:

Second, I had some thought on the 3rd question. Is it a good alternative,
maybe for flexibility, to not use multi_field for a multi-language field
but to use separate objects with a* prefix *of that language? That
object would include the same fields but from the language it specifies
analyzed with a custom analyzer. Would it be a problem if, for instance, an
item won't have "text_spanish" but in the mapping is specified?

Example of mapping:
{
"item" : {
"properties" : {
"text_default": {
"title" : { ...., "analyzer" : "default" }
"content" : { ..., "analyzer" : "default"}
}

"text_en" : {
  "title" : { ..., "analyzer" : "en_custom_analyzer" }
  "content" : {  ..., "analyzer" : "en_custom_analyzer" }
}

}
"analysis" : {
...
}
}

On Friday, January 18, 2013 8:48:22 PM UTC+1, Cristian Mihai Barca wrote:

Hi,

First of all, I am new to Elasticsearch so if I do "noobish" mistakes or
I have omitted something in the documentation please bare with me :).

I have some short questions

  1. is it possible to define into the mapping a "_customField"
    (underscored field like _all) as a list of other fields that you might have
    in the indexed doc?

ex:
{
"_en_fields" : ["en_title", "en_message", "en_tags"]
}
*
2) From the site doc: "The include_in_all setting on the “default”
field allows to control if the value of the field should be included in the
_all field. Note, the value of the field is copied to _all, not the
tokens. So, it only makes sense to copy the field value once. Because of
this, the include_in_all setting on all non-default fields is
automatically set to false and can’t be changed."
*
*"the value of the field is copied to _all, not the tokens" - *what do
you mean by this? Only the "non-analyzed" values of the default fields will
be aggregated into "_all" field?

  1. What is the most convenient way to implement multi language support
    but also assure that when a mistake is made by the user (i.e. mistake in
    language selection for an input text) the search also retrieves something
    relevant? Should the text be also indexed and stored with the default
    (general) properties? (that would mean -> 2 x time/space for that index)

  2. If a field is only indexed (not stored), could it be retrieved from
    the _index field?

--