Change(disable) Text Analysis at Index Time


(Kevin Stone) #1

How does one change (or disable) the text analyzer for certain fields at
index time.
I am injecting reasonably structured data and want to exclude most of my
data from text analysis.

In particular the "_", "-" and the "." character in my object names wreaks
havok when the analysis of this data.

I want to analyze the data in kibana but because things like
gwmon-01.groundworkopensource.com are getting tokenized into "gwmon" "01"
"groundworkopensource" "com" I am not getting the behavior I want.

As I look at the data I really only want the text analyzer active on the
"textMessage" tag below.

FWIW Im using the 1.05 Elasticsearch.PM for indexing and retrieval with
plans to use Kibana in the near future.

curl -XPUT 'http://localhost:9200/' -d

"body" : [
{
"index" : {
"_index" : "groundwork-2014.03.06",
"_id" : "755161",
"_type" : "foundation_logmessage"
}
},
{
"priority" : "Lowest Priority in a scale from 1 -10",
"operationStatus" : "ACCEPTED",
"lastInsertDate" : "2014-03-06T15:59:02.000-0800",
"monitorStatus" : "WARNING",
"appType" : "NAGIOS",
"id" : "755161",
"severity" : "WARNING",
"@timestamp" : "2014-03-06T15:59:02.000-0800",
"typeRule" : "UNDEFINED",
"firstInsertDate" : "2014-03-06T15:59:02.000-0800",
"origin" : "gwmon-01.groundworkopensource.com",
"applicationSeverity" : "WARNING",
"device" : "ps-70-esearch-connector-dev",
"service" : "ssh_cpu_perl",
"properties" : {
"SubComponent" : "ps-70-esearch-connector-dev:ssh_cpu_perl",
"ErrorType" : "SERVICE ALERT"
},
"host" : "ps-70-esearch-connector-dev",
"textMessage" : "WARNING - total %CPU for process perl : 28.5",
"monitorServer" : "localhost",
"component" : "WARNING",
"msgCount" : 1,
"reportDate" : "2014-03-06T15:59:03.531-0800"
}
],
"index" : "groundwork-2014.03.06",
"type" : "foundation_logmessage"
}

curl -XGET 'http://localhost:9200/_search?q=_id:755161&pretty=true'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 15,
"successful" : 15,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "groundwork-2014.03.06",
"_type" : "foundation_logmessage",
"_id" : "755161",
"_score" : 1.0, "_source" : {
"priority":"Lowest Priority in a scale from
1 -10",
"operationStatus":"ACCEPTED",

"lastInsertDate":"2014-03-06T15:59:02.000-0800"
,"monitorStatus":"WARNING",
"appType":"NAGIOS",
"id":"755161",
"severity":"WARNING",
"@timestamp":"2014-03-06T15:59:02.000-0800",
"typeRule":"UNDEFINED",

"firstInsertDate":"2014-03-06T15:59:02.000-0800",

"origin":"gwmon-01.groundworkopensource.com",
"applicationSeverity":"WARNING",
"device":"ps-70-esearch-connector-dev",
"service":"ssh_cpu_perl",

"properties":{"SubComponent":"ps-70-esearch-connector-dev:ssh_cpu_perl",
"ErrorType":"SERVICE
ALERT"},"host":"ps-70-esearch-connector-dev",
"textMessage":"WARNING - total %CPU for
process perl : 28.5",
"monitorServer":"localhost",
"component":"WARNING",
"msgCount":1,
"reportDate":"2014-03-06T15:59:03.531-0800"
}
} ]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00a794bf-26e8-429b-888b-ec053798d0f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

Not sure I fully understand what you are trying to do but may be this could answer to your question: "Change(disable) Text Analysis at Index Time" which means to me that you don't know in advance if you want to analyze a field or not and that you want to take that decision at index time.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-analyzer-field.html

If you know in advance which fields which doesn't require analysis, just set index: not_analyzed in mapping. Much easier but don't know if it's what you are after though.

HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 mars 2014 à 01:15, Kevin Stone kevindstone@gmail.com a écrit :

How does one change (or disable) the text analyzer for certain fields at index time.
I am injecting reasonably structured data and want to exclude most of my data from text analysis.

In particular the "_", "-" and the "." character in my object names wreaks havok when the analysis of this data.

I want to analyze the data in kibana but because things like gwmon-01.groundworkopensource.com are getting tokenized into "gwmon" "01" "groundworkopensource" "com" I am not getting the behavior I want.

As I look at the data I really only want the text analyzer active on the "textMessage" tag below.

FWIW Im using the 1.05 Elasticsearch.PM for indexing and retrieval with plans to use Kibana in the near future.

curl -XPUT 'http://localhost:9200/' -d
"body" : [
{
"index" : {
"_index" : "groundwork-2014.03.06",
"_id" : "755161",
"_type" : "foundation_logmessage"
}
},
{
"priority" : "Lowest Priority in a scale from 1 -10",
"operationStatus" : "ACCEPTED",
"lastInsertDate" : "2014-03-06T15:59:02.000-0800",
"monitorStatus" : "WARNING",
"appType" : "NAGIOS",
"id" : "755161",
"severity" : "WARNING",
"@timestamp" : "2014-03-06T15:59:02.000-0800",
"typeRule" : "UNDEFINED",
"firstInsertDate" : "2014-03-06T15:59:02.000-0800",
"origin" : "gwmon-01.groundworkopensource.com",
"applicationSeverity" : "WARNING",
"device" : "ps-70-esearch-connector-dev",
"service" : "ssh_cpu_perl",
"properties" : {
"SubComponent" : "ps-70-esearch-connector-dev:ssh_cpu_perl",
"ErrorType" : "SERVICE ALERT"
},
"host" : "ps-70-esearch-connector-dev",
"textMessage" : "WARNING - total %CPU for process perl : 28.5",
"monitorServer" : "localhost",
"component" : "WARNING",
"msgCount" : 1,
"reportDate" : "2014-03-06T15:59:03.531-0800"
}
],
"index" : "groundwork-2014.03.06",
"type" : "foundation_logmessage"
}

curl -XGET 'http://localhost:9200/_search?q=_id:755161&pretty=true'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 15,
"successful" : 15,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "groundwork-2014.03.06",
"_type" : "foundation_logmessage",
"_id" : "755161",
"_score" : 1.0, "_source" : {
"priority":"Lowest Priority in a scale from 1 -10",
"operationStatus":"ACCEPTED",
"lastInsertDate":"2014-03-06T15:59:02.000-0800"
,"monitorStatus":"WARNING",
"appType":"NAGIOS",
"id":"755161",
"severity":"WARNING",
"@timestamp":"2014-03-06T15:59:02.000-0800",
"typeRule":"UNDEFINED",
"firstInsertDate":"2014-03-06T15:59:02.000-0800",
"origin":"gwmon-01.groundworkopensource.com",
"applicationSeverity":"WARNING",
"device":"ps-70-esearch-connector-dev",
"service":"ssh_cpu_perl",
"properties":{"SubComponent":"ps-70-esearch-connector-dev:ssh_cpu_perl",
"ErrorType":"SERVICE ALERT"},"host":"ps-70-esearch-connector-dev",
"textMessage":"WARNING - total %CPU for process perl : 28.5",
"monitorServer":"localhost",
"component":"WARNING",
"msgCount":1,
"reportDate":"2014-03-06T15:59:03.531-0800"
}
} ]
}
}

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00a794bf-26e8-429b-888b-ec053798d0f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6EE30320-C5BD-4C18-8C4A-9E92EBDACE1C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Kevin Stone) #3

Mapping is what I needed.

Off to go figure out how to do that.

Thanks
-Kevin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2b7df10-cdd3-494a-83ce-b2ee2e184674%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4