Search in HL7 message with elasticsearch

Sheel_Shah · August 26, 2015, 11:53am

We are using elasticsearch for searching in HL7 message stored in MongoDB using GridFS. By use of mongodb-river we are able to search via keyword. Following is working scenario.

Creating index and connecting MongoDB and elasticsearch.

curl -XPUT 'http://localhost:9200/_river/demoindex/_meta' -d '
{
	"type": "mongodb",
	"mongodb": { 
		"servers": [ 
			{ "host": "localhost", "port": "27017" } 
		],
		"options": { "secondary_read_preference": "true" },
		"db": "demodb",
		"gridfs":"true",
		"collection": "democollection"
	},
	"index": { 
		"name": "demoindex",
		"type": "files"
	}
}'

HL7 message example,

MSH|^~\&|LAB|767543|ADT|767543|199003141304-0500||ACK^^ACK|XX3657|P|2.4
MSA|AR|ZZ9380|059805^^^MCH^MR~000000339016^^^MCH^EE~508625465^^^MCH^SS
ERR|PID^1^16^103&Table value not found&HL70357

Created some terms/token by standard analyzer/tokenizer,

msh lab 767543 adt 767543 199003141304 0500 ack ack xx3657 p 2.4 msa ar zz9380 059805 mch mr 000000339016 mch ee 508625465 mch ss err pid 1 16 103 table value not found hl70357

By this we can easily search keywords from messages. But my actual requirement is search segment wise. Means if user say search '141304' on MSH-7, or search '508625' on MSA-4. So by standard tokenizer this thing is not possible, also searched with regexp filter but no chance coz whole message breaks into tokens. So we decided to create tokens first separated by '/r (carriage return)', separated by '| (pipe)', separated by '^ (cape)', separated by '~ (tilde)' and finally standard tokenizer. So I tired with basic regex tokenizer but when I save message into MongoDB then still tokenized on standard tokenizer.

New index settings

curl -XPUT localhost:9200/demoindex/  -d '
{
   "settings" : {
   	"analysis" : {
   		"analyzer" : {
   			"my_pattern": {
   				"type": "pattern",
        	"lowercase": true,
        	"pattern": "[\\d ]+"
      	}
     	},
			"tokenizer" : {
      	"my_tokens": {
        	"type": "pattern",
        	"pattern": "[\\d ]+",
        	"flags": "",
        	"group": -1
    		}
      }
     }
   }
}
'

Output of http://localhost:9200/demoindex/_settings

{
	"demoindex":{
		"settings":{
			"index":{
				"creation_date":"1440586817485",
				"uuid":"MiooEsf0T1qFdH_9DqbdoA",
				"analysis":{
					"analyzer":{
						"my_pattern":{
							"type":"pattern",
							"pattern":"[\\d ]+",
							"lowercase":"true"
						}
					},
					"tokenizer":{
						"my_tokens":{
							"flags":"",
							"pattern":"[\\d ]+",
							"group":"-1",
							"type":"pattern"
						}
					}
				},
				"number_of_replicas":"1",
				"number_of_shards":"5",
				"version":{
					"created":"1040299"
				}
			}
		}
	}
}

After applying this still message get tokenized in standard manner. Where are we wrong? Or any other better approach to achieve this.

Sarwar · August 26, 2015, 12:17pm

This might sound pretty basic but did you delete your index/data after changing the mapping and then reindex?

Sheel_Shah · August 26, 2015, 12:50pm

@Sarwar Yes we did. That's why this is bit surprising.

Sarwar · August 26, 2015, 1:26pm

Sorry, just looked at your setting again, and the analyser section seems to be wrong. It needs to be something like here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.htm

In your analyzer, you are setting something called my_pattern which seems to be just another filter. It needs to have type set to "custom" and then values the keys "tokenizer", "filter", and "char_filter"

Topic		Replies	Views
Elasticsearch mongodb river with GridFS attached to DBObject problem Elasticsearch	1	681	July 6, 2017
DSL Query to search in message field for some value and it must not contain something else Elasticsearch	2	3155	October 16, 2021
Elastic Search using Message Content Parse with Logstash Logstash	6	976	January 26, 2022
Elastic search With MongoDB : Searching PDFs Elasticsearch	16	2732	July 6, 2017
Zero results retrieved with ES Elasticsearch	4	370	July 6, 2017

Search in HL7 message with elasticsearch

Related topics