Does it make sense to index whole document

Hi,

I have created my river for my MongoDB Asset collection that should be
indexed by ElasticSearch.

To do that I have used the following command:

Create river for Asset collection

http://localhost:9200/_river/asset_river/_meta -d '{

"type": "mongodb",

"mongodb": {

 "db": "my_database",

 "collection": "Asset"

},

"index": {

 "name": "asset_index",

 "type": "Asset"

}

}'

I think this will basically index every field in all my Asset documents –
is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can supply a custom mapping for that index BEFORE the index is created,
denoting which fields should and should not be indexed. You cannot change
the source document, but you do have control over what fields are indexed.

--
Ivan

On Mon, Aug 12, 2013 at 7:57 PM, JD jdalecki@tycoint.com wrote:

Hi,****

I have created my river for my MongoDB Asset collection that should be
indexed by Elasticsearch.****

To do that I have used the following command:****

Create river for Asset collection****

http://localhost:9200/_river/asset_river/_meta -d '{****

"type": "mongodb",****

"mongodb": {****

 "db": "my_database",****

 "collection": "Asset"****

},****

"index": {****

 "name": "asset_index",****

 "type": "Asset"****

}****

}'

I think this will basically index every field in all my Asset documents –
is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?****

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Slight correction in my last email. You can create the index with a custom
mapping BEFORE the river is created. I said before the index is created,
which is wrong.

--
Ivan

On Tue, Aug 13, 2013 at 4:12 PM, Ivan Brusic ivan@brusic.com wrote:

You can supply a custom mapping for that index BEFORE the index is
created, denoting which fields should and should not be indexed. You cannot
change the source document, but you do have control over what fields are
indexed.

--
Ivan

On Mon, Aug 12, 2013 at 7:57 PM, JD jdalecki@tycoint.com wrote:

Hi,****

I have created my river for my MongoDB Asset collection that should be
indexed by Elasticsearch.****

To do that I have used the following command:****

Create river for Asset collection****

http://localhost:9200/_river/asset_river/_meta -d '{****

"type": "mongodb",****

"mongodb": {****

  "db": "my_database",****

  "collection": "Asset"****

},****

"index": {****

  "name": "asset_index",****

  "type": "Asset"****

}****

}'

I think this will basically index every field in all my Asset documents –
is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?****

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Ivan,

  1.     So would you agree with me that indexing all fields (columns) 
    

in all documents is an overkill?
2. Is there a comprehensive document how to set what field is to
be analysed – do I have to do it for every field? Is there a global flag I
can set to not analyse field and then set individual fields to be analysed?

Regards,

Janusz

On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:

You can supply a custom mapping for that index BEFORE the index is
created, denoting which fields should and should not be indexed. You cannot
change the source document, but you do have control over what fields are
indexed.

--
Ivan

On Mon, Aug 12, 2013 at 7:57 PM, JD <jdal...@tycoint.com <javascript:>>wrote:

Hi,****

I have created my river for my MongoDB Asset collection that should be
indexed by Elasticsearch.****

To do that I have used the following command:****

Create river for Asset collection****

http://localhost:9200/_river/asset_river/_meta -d '{****

"type": "mongodb",****

"mongodb": {****

  "db": "my_database",****

  "collection": "Asset"****

},****

"index": {****

  "name": "asset_index",****

  "type": "Asset"****

}****

}'

I think this will basically index every field in all my Asset documents –
is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?****

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Janusz,

Answers inline.

On Tue, Aug 13, 2013 at 5:38 PM, JD jdalecki@tycoint.com wrote:

Hi Ivan,****

  1.     So would you agree with me that indexing all fields
    

(columns) in all documents is an overkill?

The biggest downside is that your overall index size will be bigger. If
your Lucene index does not fit into memory, elasticsearch could swap to
disk more frequently. Field/filter caches will not be bigger if the fields
are not queried (ignoring the _all field for now).

  1.     Is there a comprehensive document how to set what field is
    

to be analysed – do I have to do it for every field?

It is up to your application about which fields should be indexed or not.
Not analyzed and not indexed are two different things. Not analyzed means a
field is indexed, but does not go through the tokenization/filtering
process. Providing your custom mapping will not only mark which fields
should be indexed, but also how they should be analyzed. By default,
indexed fields uses the Standard analyzer, but you will soon discover that
certain fields require a different analyzer or not be analyzed at all
(while still being indexed).

  1. Is the a global flag I can set to not analyse field and then set
    individual fields to be analysed?

Look into dynamic templates:

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,
So in this simple mapping example below:
{
"Asset" : {
"_all":{
"analyzer":"english"
},
"assetCategoryId" : {
"type" : "long", indexed : false
},
"className" : {
"type" : "string"
}
}
}

I have added 'indexed' property set to false for the filed '
assetCategoryId'.
Is this correct what I have done. From now on 'assetCategoryId' the should
not be indexed - as I never search on that field.
Regards,
Janusz

On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:

You can supply a custom mapping for that index BEFORE the index is
created, denoting which fields should and should not be indexed. You cannot
change the source document, but you do have control over what fields are
indexed.

--
Ivan

On Mon, Aug 12, 2013 at 7:57 PM, JD <jdal...@tycoint.com <javascript:>>wrote:

Hi,****

I have created my river for my MongoDB Asset collection that should be
indexed by Elasticsearch.****

To do that I have used the following command:****

Create river for Asset collection****

http://localhost:9200/_river/asset_river/_meta -d '{****

"type": "mongodb",****

"mongodb": {****

  "db": "my_database",****

  "collection": "Asset"****

},****

"index": {****

  "name": "asset_index",****

  "type": "Asset"****

}****

}'

I think this will basically index every field in all my Asset documents –
is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?****

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The setting to have a field not indexed is "index": "no".

"assetCategoryId" : { "type" : "long", "index": "no" }

The className attribute is using the default settings, so you can even
exclude it if you want. Depends if you like explicit or concise mappings (I
prefer the former, just like you have it now).

Cheers,

Ivan

On Tue, Aug 13, 2013 at 8:02 PM, JD jdalecki@tycoint.com wrote:

Hi,
So in this simple mapping example below:
{
"Asset" : {
"_all":{
"analyzer":"english"
},
"assetCategoryId" : {
"type" : "long", indexed : false
},
"className" : {
"type" : "string"
}
}
}

I have added 'indexed' property set to false for the filed '
assetCategoryId'.
Is this correct what I have done. From now on 'assetCategoryId' the
should not be indexed - as I never search on that field.
Regards,
Janusz

On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:

You can supply a custom mapping for that index BEFORE the index is
created, denoting which fields should and should not be indexed. You cannot
change the source document, but you do have control over what fields are
indexed.

--
Ivan

On Mon, Aug 12, 2013 at 7:57 PM, JD jdal...@tycoint.com wrote:

Hi,****

I have created my river for my MongoDB Asset collection that should be
indexed by Elasticsearch.****

To do that I have used the following command:****

Create river for Asset collection****

http://localhost:9200/_river/**asset_river/_meta -d '{****

"type": "mongodb",****

"mongodb": {****

  "db": "my_database",****

  "collection": "Asset"****

},****

"index": {****

  "name": "asset_index",****

  "type": "Asset"****

}****

}'

I think this will basically index every field in all my Asset documents
– is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?****

Regards,

Janusz

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

Thank you Ivan very much for all the info you have provided in this thread.

Just one more thing - is there a documentation on this syntax - I have
tried to find out what type of fields I can put in mapping file and what
are the valid values, but I couldn't.

Regards,

Janusz

On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:

You can supply a custom mapping for that index BEFORE the index is
created, denoting which fields should and should not be indexed. You cannot
change the source document, but you do have control over what fields are
indexed.

--
Ivan

On Mon, Aug 12, 2013 at 7:57 PM, JD <jdal...@tycoint.com <javascript:>>wrote:

Hi,****

I have created my river for my MongoDB Asset collection that should be
indexed by Elasticsearch.****

To do that I have used the following command:****

Create river for Asset collection****

http://localhost:9200/_river/asset_river/_meta -d '{****

"type": "mongodb",****

"mongodb": {****

  "db": "my_database",****

  "collection": "Asset"****

},****

"index": {****

  "name": "asset_index",****

  "type": "Asset"****

}****

}'

I think this will basically index every field in all my Asset documents –
is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?****

Regards,

Janusz

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I meant to include some links, but I forgot.

The core type documentation information on the settings:

Overall analysis:

Elasticsearch is built on top of Lucene, so all of the analysis concepts
such as analyzers, tokenizers and filters are also explained in Lucene
documentation. That said, Lucene's documentation is not very good. Solr is
also built on top of Lucene, and has decent docs:

Cheers,

Ivan

On Wed, Aug 14, 2013 at 4:45 PM, JD jdalecki@tycoint.com wrote:

Hi,****

Thank you Ivan very much for all the info you have provided in this thread.


Just one more thing - is there a documentation on this syntax - I have
tried to find out what type of fields I can put in mapping file and what
are the valid values, but I couldn't.****

Regards,****

Janusz****

On Wednesday, August 14, 2013 9:12:12 AM UTC+10, Ivan Brusic wrote:

You can supply a custom mapping for that index BEFORE the index is
created, denoting which fields should and should not be indexed. You cannot
change the source document, but you do have control over what fields are
indexed.

--
Ivan

On Mon, Aug 12, 2013 at 7:57 PM, JD jdal...@tycoint.com wrote:

Hi,****

I have created my river for my MongoDB Asset collection that should be
indexed by Elasticsearch.****

To do that I have used the following command:****

Create river for Asset collection****

http://localhost:9200/_river/**asset_river/_meta -d '{****

"type": "mongodb",****

"mongodb": {****

  "db": "my_database",****

  "collection": "Asset"****

},****

"index": {****

  "name": "asset_index",****

  "type": "Asset"****

}****

}'

I think this will basically index every field in all my Asset documents
– is that right?.

Does it make sense to do that – if I understand it I will have so many
indexes that the searching might be as slow as just searching through the
documents themselves? I am right?****

Regards,

Janusz

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.