I have documents with duplicate sentences in the 'notes' field. I was able to tokenize this field and get only the original sentences and their offsets.
When the user views this "notes" field, I would like to highlight these original sentences. It seems like I should be able to since the offsets are stored, but I just can't figure out how to implement.
Any input on this matter is greatly appreciated. thank you.
// PUT mimic_dat
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"tokenizer": {
"mimic_tokenizer": {
"type": "pattern",
"pattern": """(\.\s|\n+)""",
"group": -1
}
},
"filter": {
"unique_mimic": {
"type": "unique",
"only_on_same_position": false
}
},
"analyzer": {
"mimic_hash_analyzer": {
"type": "custom",
"tokenizer": "mimic_tokenizer",
"filter": [
"unique_mimic"
]
}
}
}
},
"mappings": {
"mimic_type": {
"properties": {
"subject_id": {
"type": "keyword"
},
"notes": {
"type": "text",
"fielddata": true,
"fields": {
"my_hash": {
"type": "text",
"analyzer": "mimic_hash_analyzer",
"fielddata": true,
"term_vector": "with_positions_offsets",
"store": true
}
}
}
}
}
}
}
// PUT mimic_dat/mimic_type/4
{
"notes": """
Past History: Chronic xx which lead to; Ca.
Review of systems: Cardiac, SR.
O2: sats on room air 100%.
ID: No active issues, temp 99.3 PO.
Review of systems: Cardiac, SR.
ID: No active issues, temp 99.3 PO.
"""
}
Reply
This topic will close a month after the last reply.
Bookmark Share Flag Reply
Watching
You will receive notifications because you created this topic.
Suggested Topics
Topic | Replies | Views | Activity |
---|---|---|---|
Run remote commands from DevTools |
Elasticsearch|0|2|20m|
|Modify core storage of Elastic search
Elasticsearch|2|16|20m|
|ELK architecture optimization
Elasticsearch|0|6|1h|
|Recognizing succeeded vs failed tasks w/ the Task Management API
Elasticsearch|0|7|1h|
|Shipping logs from Central machine(Jenkins machine) to Elasticsearch
Elasticsearch|0|6|1h|
There are 195 new topics remaining, or browse other topics in Elasticsearch
© 2018. All Rights Reserved - Elasticsearch
- Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries
- Trademarks
- Terms
- Privacy
- Brand
- Code of Conduct
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.