Boundary_chars not working


(dagda1) #1

I am trying to limit what is highlighted by setting boundary_chars and
boundary_max_scan:

I have created my index with the following statement:

curl -XPUT 'localhost:9200/_river/email/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy":"simple",
"poll":"10",
"driver" : "org.postgresql.Driver",
"url" : "jdbc:postgresql://localhost:5432/api_development",
"username" : "paulcowan",
"password" : "",
"sql" : "SELECT distinct e.id as "_id", ep.user_id as "user_id",
folder, subject, body, personal, sent_at, read_by, account_id,
sender_user_id, sender_contact_id, html, draft FROM emails e inner join
email_participants ep on e.id = ep.email_id where ep.user_id is not null"
},
"index" : {
"index" : "email",
"type" : "jdbc",
"type_mapping": "{"email" : {"properties" :
{"folder":{"type":"string","index":"not_analyzed"},"subject":{"type":"string","term_vector":"with_positions_offsets"},"html":{"type":"string","term_vector":"with_positions_offsets"},"body":{"type":"string","term_vector":"with_positions_offsets"}}}}"
}
}'

If I list my mappings they are:

{

"jdbc" : {

"properties" : { 

  "account_id" : { 

    "type" : "long" 

  }, 

  "body" : { 

    "type" : "string", 

    "term_vector" : "with_positions_offsets" 

  }, 

  "draft" : { 

    "type" : "boolean" 

  }, 

  "folder" : { 

    "type" : "string", 

    "index" : "not_analyzed", 

    "omit_norms" : true, 

    "index_options" : "docs" 

  }, 

  "html" : { 

    "type" : "string", 

    "term_vector" : "with_positions_offsets" 

  }, 

  "personal" : { 

    "type" : "boolean" 

  }, 

  "read_by" : { 

    "type" : "string" 

  }, 

  "sender_contact_id" : { 

    "type" : "long" 

  }, 

  "sender_user_id" : { 

    "type" : "long" 

  }, 

  "sent_at" : { 

    "type" : "date", 

    "format" : "dateOptionalTime" 

  }, 

  "subject" : { 

    "type" : "string", 

    "term_vector" : "with_positions_offsets" 

  }, 

  "user_id" : { 

    "type" : "long" 

  } 

} 

}

}

If I issue the following search:

curl -XGET 'http://localhost:9200/email/jdbc/_search?pretty=true' -d '{

"query": {

    "query_string": {

      "query": "fullcontact*",

      "default_operator": "AND"

    }

},

"highlight" : {

"pre_tags": ["<em class='highlight'>"],

"post_tags": ["</em>"],

    "fields" : {

        "subject" : {

          "boundary_chars": ".,!? \t\n",

          "boundary_max_scan": 0

        },

        "body": {

          "boundary_chars": ".,!? \t\n",

          "boundary_max_scan": 0

        },

        "html": {

          "boundary_chars": ".,!? \t\n",

          "boundary_max_scan": 0

        }

    }

},

"filter": {

    "bool": {

        "must": [

          {"term": {"account_id": 1}},

          {"term": {"folder": "INBOX"}},

          {"term": { "user_id": 3}}

        ]

    }

}

}'

Then I get back results like this:

"highlight" : {

    "body" : [ "The <em class=highlight>FullContact</em> Playbook\nLife 

is a contact sport. Play it well.\n\nOctober 4, 2013\n\n==================", "ic/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)", "me/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)", "ng/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)", "ct/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)" ],

    "subject" : [ "<em class=highlight>FullContact</em> - Weekly 

Playbook" ],

    "html" : [ "name=\"viewport\" 

content="width=device-width">\n\tFullContact - Weekly
Playbook\n\n\t\n\t\n\t<!--FIXES FOR", "160);font-size:
13px;text-align: center;">The FullContact
Playbook – Life is a contact sport. Play", "#009ebb;
text-decoration:none;" href="http://fullcontact.us2.list-manage1.com/track/click?u=92c1a2b7e3b"
, "width="698" alt="The FullContact Playbook"
src="https://s3.amazonaws.com/fullcontact-static/images/emai", "Play it well."
src="https://s3.amazonaws.com/fullcontact-static/images/emails/newsletter/tagline.jpg""
]

  }

} ]

}

}

The boundary_chars is not working as it is highlighting the term
fullcontact after an equals sign or really anything.

e.g. ?utm_source=fullcontact

I cannot find an online example of how to set these values correctly. Can
anybody help?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34a8f38d-36a7-456f-a291-7b1adf012723%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2