Highlight the result of tokenization when viewing full text


(Summer K Rankin) #1

I have documents with duplicate sentences in the 'notes' field. I was able to tokenize this field and get only the original sentences and their offsets.

When the user views this "notes" field, I would like to highlight these original sentences. It seems like I should be able to since the offsets are stored, but I just can't figure out how to implement.

Any input on this matter is greatly appreciated. thank you.

// PUT mimic_dat
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "tokenizer": {
        "mimic_tokenizer": {
          "type": "pattern",
          "pattern": """(\.\s|\n+)""",
          "group": -1
        }
      },
      "filter": {
        "unique_mimic": {
          "type": "unique",
          "only_on_same_position": false
        }
      },
      "analyzer": {
        "mimic_hash_analyzer": {
          "type": "custom",
          "tokenizer": "mimic_tokenizer",
          "filter": [
            "unique_mimic"
          ]
        }
      }
    }
  },
  "mappings": {
    "mimic_type": {
      "properties": {
        "subject_id": {
          "type": "keyword"
        },
        "notes": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "my_hash": {
              "type": "text",
              "analyzer": "mimic_hash_analyzer",
              "fielddata": true,
              "term_vector": "with_positions_offsets",
              "store": true
            }
          }
        }
      }
    }
  }
}

// PUT mimic_dat/mimic_type/4
{
  "notes": """
Past History: Chronic xx which lead to; Ca.

Review of systems:    Cardiac,   SR.
O2: sats on room air 100%.  

ID:  No active issues, temp 99.3 PO.

Review of systems:    Cardiac,   SR.

ID:  No active issues, temp 99.3 PO. 
"""
}

Reply

This topic will close a month after the last reply.

Bookmark Share Flag Reply

Watching

You will receive notifications because you created this topic.

Suggested Topics

Topic Replies Views Activity
Run remote commands from DevTools

Elasticsearch|0|2|20m|
|Modify core storage of Elastic search

Elasticsearch|2|16|20m|
|ELK architecture optimization

Elasticsearch|0|6|1h|
|Recognizing succeeded vs failed tasks w/ the Task Management API

Elasticsearch|0|7|1h|
|Shipping logs from Central machine(Jenkins machine) to Elasticsearch

Elasticsearch|0|6|1h|

There are 195 new topics remaining, or browse other topics in Elasticsearch

© 2018. All Rights Reserved - Elasticsearch

Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.