According to http://exploringelasticsearch.com/advanced_techniques.html#advanced-internals ,
"The smallest individual unit of data in elasticsearch is a field...Documents are collections of fields, and comprise the base unit of storage in elasticsearch.The reason a document is considered the base unit of storage is because, peculiar to Lucene, all field updates fully rewrite a given document to storage (while preserving unmodified fields). So, while from an API perspective the field is the smallest single unit, the document is the smallest unit from a storage perspective."
And as I have a specific schema (see below) when populating index with Logstash from a csv file, I need (I suppose) to have another schema with same datas to restitute them in Kibana :
is there a way to say to ES : this field has been already analyzed for this schema (_type) in this index and I will reuse the same field with the same content in another type or index,so don't do the job again ?
Here's my usecase :
a company has an internal support team : each employee's phone call is traced and sended to a logstash cllent and automatically inserted in ES.
The employee's data and the support's menu data are also in ES.
1 ES index with different document types (EMP and MENU are almost fixed datas)
_type : EMP _type : MENU _type : SUPPORT
_id : idHR _id : selmenu _id : phoneCall id (Sequential ID)
idHR : X1200 selmenu : 1 callDate : 2014-09-30 11:15:34
fname : "Bob" libmenu : Win7 support telnum: 59.92 (internal tel number)
lname : "PALMER" suppteamidRH : [ X1132,... ] teldur : 72 (tel duration : 72s)
telnum : 59.92 selmenu : 1
dept : DIR/MKG (Direction / Marketing )
I wanted to produce these stats in Kibana :
Display phonecalls count (by duration/by selmenu) on a selected time period => with _type : SUPPORT : no additional needs, Kibana can do that easily !!
Display phonecalls on a selected time period by employee name/dept : more difficult as the relation field is the phone number between SUPPORT and EMP mapping.
=> I can achieve my goal in different ways :
a) add 2 fields in SUPPORT mapping schema and populate them with EMP datas => can I (and how to ) do that the most easier (with logstash, ES ) ?
b) create a new index with a new mapping schema but same question => how to populate it easily ?
( http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ ==> pull the documents in from your old index, using a scrolled search and index them into the new index using the bulk API.
Note: make sure that you include search_type=scan in your search request. This disables sorting and makes “deep paging” efficient ) ==> but I need to use Perl or Python
With this case 2b), my underlying question is :
as discussed at the beginning of my post, is it optimized to do that ?
Would it be better to have for each employee a nested document with SUPPORT datas ? But if true, the process will be :
Each time I receive SUPPORT datas in logstash, I need to :
a) query ElasticSearch which employee have this phone number and get his HR id
b) create the nested document with received SUPPORT datas
What's the best solution to achieve my goal ???
Thank's for your aswers.