We are building a project where approximate amount of data will be 10
million docs per month.
Document contains two major keys:
- Title (containing specific data, that shouldn't be analyzed as
separate words, therefore i plan to use Keyword analyzer) (String)
- Date (DateOptionalTime)
- Time. Time in seconds. (Integer)
- UserID (integer)
The aim of project is to collect specific user data and allow users in our
application to define filters using simple "Begin with" and "Contains"
rules combining with date range based on Date field and UserID. Based on
these rules we must query data and return aggregated sum of field Time.
Some questions regarding the mapping:
- Is it a good practice and is it worth to use short names to save some
storage space. For example title -> t, date -> d, time -> tm, and so on?
- I didn't perfectly understood *store=yes *mapping parameter. In the
docs it is said:
"Set to yes the store actual field in the index, no to not store it.
Defaults to no"
However i don't understand the performance advantages/disadvantages of
this field. In our case if i want to do aggregate sum of Time using
Statistical Facets, should i use store=yes so that aggregation is faster
and value is used from index and not from store or it doesn't affect
- What other advises you could suggest to gain performance keeping in
mind that aggregation will be required using Statistical Facets. The single
document as itself will be quite small. Just 5-6 keys and Title field will
have maximum 300 characters.
Current index config and mapping is here: https://gist.github.com/2757570
We are now dealing with mapping, so right now for development we are using
dev-server with no replications at all.