What analyzer for network device syslogs


currently we are indexing some UTM syslogs with the default analyzer:
example of log:

time=18:10:48 log_id=20000010 msg_id=000000020218
device_id=FVVM020000043660 vd="root"
timezone="(GMT+9:00)Osaka,Sapporo,Tokyo,Seoul" type=attack
subtype="waf_signature_detection" pri=alert trigger_policy=""
severity_level=Low proto=tcp service=http action=Alert_Deny
policy="POL_Web-02" src= src_port=55645 dst=
dst_port=80 http_method=get http_url="/DoNotDelete/healthcheck.txt"
http_host="www3.arrowgate.com" http_agent="Wget/1.12 (linux-gnu)"
http_session_id=none msg="[Signatures name: Product_sig] [main class
name: Bad Robot]: 110000003" signature_subclass="Bad Robot"
signature_id="110000003" srccountry="United States"
content_switch_name="none" server_pool_name="Web-02_11.22.99.99"


"rawlog": {
"type": "string",
"store": "no",
"index": "analyzed",
"index_options": "offsets",
"doc_values": false

I read that this may not be the most efficient one for this kind of usage.

Is there a more suitable analyser that we should use?


Read where?

    down vote


To add to Torsten Engelbrecht's answer, default analyzer might be
part of the culprit. This analyzer will index every form of each word as
a separate token, meaning that a single verb in a language with complex
conjugation can be indexed a dozen times. Also, that degrades the
quality of the search results. The same applies if your documents
contain formatting information (HTML markup ?).