I am trying to limit what is highlighted by setting boundary_chars and
boundary_max_scan:
I have created my index with the following statement:
curl -XPUT 'localhost:9200/_river/email/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy":"simple",
"poll":"10",
"driver" : "org.postgresql.Driver",
"url" : "jdbc:postgresql://localhost:5432/api_development",
"username" : "paulcowan",
"password" : "",
"sql" : "SELECT distinct e.id as "_id", ep.user_id as "user_id",
folder, subject, body, personal, sent_at, read_by, account_id,
sender_user_id, sender_contact_id, html, draft FROM emails e inner join
email_participants ep on e.id = ep.email_id where ep.user_id is not null"
},
"index" : {
"index" : "email",
"type" : "jdbc",
"type_mapping": "{"email" : {"properties" :
{"folder":{"type":"string","index":"not_analyzed"},"subject":{"type":"string","term_vector":"with_positions_offsets"},"html":{"type":"string","term_vector":"with_positions_offsets"},"body":{"type":"string","term_vector":"with_positions_offsets"}}}}"
}
}'
If I list my mappings they are:
{
"jdbc" : {
"properties" : {
"account_id" : {
"type" : "long"
},
"body" : {
"type" : "string",
"term_vector" : "with_positions_offsets"
},
"draft" : {
"type" : "boolean"
},
"folder" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"index_options" : "docs"
},
"html" : {
"type" : "string",
"term_vector" : "with_positions_offsets"
},
"personal" : {
"type" : "boolean"
},
"read_by" : {
"type" : "string"
},
"sender_contact_id" : {
"type" : "long"
},
"sender_user_id" : {
"type" : "long"
},
"sent_at" : {
"type" : "date",
"format" : "dateOptionalTime"
},
"subject" : {
"type" : "string",
"term_vector" : "with_positions_offsets"
},
"user_id" : {
"type" : "long"
}
}
}
}
If I issue the following search:
curl -XGET 'http://localhost:9200/email/jdbc/_search?pretty=true' -d '{
"query": {
"query_string": {
"query": "fullcontact*",
"default_operator": "AND"
}
},
"highlight" : {
"pre_tags": ["<em class='highlight'>"],
"post_tags": ["</em>"],
"fields" : {
"subject" : {
"boundary_chars": ".,!? \t\n",
"boundary_max_scan": 0
},
"body": {
"boundary_chars": ".,!? \t\n",
"boundary_max_scan": 0
},
"html": {
"boundary_chars": ".,!? \t\n",
"boundary_max_scan": 0
}
}
},
"filter": {
"bool": {
"must": [
{"term": {"account_id": 1}},
{"term": {"folder": "INBOX"}},
{"term": { "user_id": 3}}
]
}
}
}'
Then I get back results like this:
"highlight" : {
"body" : [ "The <em class=highlight>FullContact</em> Playbook\nLife
is a contact sport. Play it well.\n\nOctober 4, 2013\n\n==================", "ic/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)", "me/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)", "ng/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)", "ct/?utm_source=fullcontact-weekly-playbook&utm_medium=email&utm_campaign=fullcontact-weekly-playbook)" ],
"subject" : [ "<em class=highlight>FullContact</em> - Weekly
Playbook" ],
"html" : [ "name=\"viewport\"
content="width=device-width">\n\tFullContact - Weekly
Playbook\n\n\t\n\t\n\t<!--FIXES FOR", "160);font-size:
13px;text-align: center;">The FullContact
Playbook – Life is a contact sport. Play", "#009ebb;
text-decoration:none;" href="http://fullcontact.us2.list-manage1.com/track/click?u=92c1a2b7e3b"
, "width="698" alt="The FullContact Playbook"
src="https://s3.amazonaws.com/fullcontact-static/images/emai", "Play it well."
src="https://s3.amazonaws.com/fullcontact-static/images/emails/newsletter/tagline.jpg""
]
}
} ]
}
}
The boundary_chars is not working as it is highlighting the term
fullcontact after an equals sign or really anything.
e.g. ?utm_source=fullcontact
I cannot find an online example of how to set these values correctly. Can
anybody help?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34a8f38d-36a7-456f-a291-7b1adf012723%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.