What should be the mapping/data model of my documents?


(Jean Wisser) #1

Hello everyone !

I'm currently parsing text from internal résumés in my company. The goal is to index everything in elasticsearch to perform search on them.

for the moment I have the following JSON document (= resume) :

Each coworker has a list of project with the client name

{
name: "Jean Wisser"
position: "Junior Developer"
age: 24
"projects": [
        {
            "client": "SutrixMedia",
            "location": "Ho Chi Minh, Vietnam",
            "dates": "Mai à Septembre 2015",
            "project": "Web sites creation",
            "role": "Project Manager",
            "missions": [
                "Responsible for the quality on time and within budget",
                "Writing specs, testing,..."
            ],
            "env": "JIRA/Mantis/Adobe CQ5 (AEM)"
        },
        {
            "client": "Société Générale",
            "localisation": "Luxembourg, Luxembourg",
            "dates": "Mai à Septembre 2015",
            "projet": "UAT Testing",
            "role": "Business analyst",
            "missions": [
                " Writing test cases and scenarios",
                " UAT"],
            "env": "HP QTP/QC"
        }
    ],
    "comp_tech": [ " JAVA/JEE"," Web : PHP, HTML","QlickSense · Qlikview"," HP QTP/QC"," SQL","MySQL","Sybase" ],
    "comp_fonct": [ "Banking", "Pharmaceutical" ]
}

The 2 main questions we would like to answer are :

  1. Which coworker has already worked in this company ?
  2. Which client use this technology ?

The first question is really easy, for example:

  1. Projects.client="SutrixMedia" returns me the right resume

But how can I answer to the second one ?

Should I define some mapping for the nested elements or should I create 3 different type of documents : coworker, project, and client with foreign keys ?
What is your advice ?

Thanks a lot.


(system) #2