Search request not returning documents anymore

I'm not able any more to find some documents in the indices.
There are some documents having this field and value : "source" : "spectre_supEvt".
But a query in ElasticSearch filtering on the field and value return nothing (they used to return the documents). It was returning some the documents in the past, and those documents are still in the indices.
This problem is making me me mad ! :woozy_face: :sob:

See the details of the problem below, with the requests and incoherent responses I'm getting.

I'm using ElasticSearch version: 5.6.3, build: 1a2f265/2017-10-06T20:33:39.012Z, and JVM: 1.8.0_172.

Do you have an idea of the origin of the problem, and how I can solve it ?


Here are the request and the erroneous response :

curl -X GET http://ip:9200/graylog_5/_search?pretty=true -d '
{
    "query": {
		"term": { "source": "spectre_supEvt" }
    },
    "size": 1
}'

returns :

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

We can check that there are some matching documents using the following request.
The response contains this field : "source" : "spectre_supEvt"

curl -X GET http://ip:9200/graylog_5/_search?pretty=true -d '
{
    "query": {
		"term": { "sysSourceId": "MDC" }
    },
    "size": 1
}'

returns :

{
  "took" : 679,
  "timed_out" : false,
  "_shards" : {
    "total" : 88,
    "successful" : 88,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 31959030,
    "max_score" : 0.34491536,
    "hits" : [
      {
        "_index" : "graylog_5",
        "_type" : "message",
        "_id" : "0673b150-caca-11e8-8f2c-0600ea37eac6",
        "_score" : 0.34491536,
        "_source" : {
          "sysSourceId" : "MDC",
          "source" : "spectre_supEvt",
          ...
        }
      }
    ]
  }
}

We can check that the problem is not caused by the "source" field name, using another value in the request :

curl -X GET http://ip:9200/graylog_5/_search?pretty=true -d '
{
    "query": {
		"term": { "source": "spectre" }
    },
    "size": 1
}'

returns :

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 341563,
    "max_score" : 4.075432,
    "hits" : [
      {
        "_index" : "graylog_5",
        "_type" : "message",
        "_id" : "5593a6d0-cb67-11e8-8f2c-0600ea37eac6",
        "_score" : 4.075432,
        "_source" : {
          "source" : "spectre",
          ...
        }
      }
    ]
  }
}

What is the index mapping amd the analysis setup for the graylog_5 index?

Those mappings and settings are managed by my Graylog server which relies on ElasticSearch :

curl -X GET http://ip:9200/graylog_5/_mapping?pretty=true

returns :

{
  "graylog_5" : {
	"mappings" : {
	  "message" : {
		"dynamic_templates" : [
		  {
			"internal_fields" : {
			  "match" : "gl2_*",
			  "mapping" : {
				"type" : "keyword"
			  }
			}
		  },
		  {
			"store_generic" : {
			  "match" : "*",
			  "mapping" : {
				"index" : "not_analyzed"
			  }
			}
		  }
		],
		"properties" : {
		  "actOfManagementType" : {
			"type" : "keyword"
		  },
		  "application" : {
			"type" : "keyword"
		  },
		  "commId" : {
			"type" : "keyword"
		  },
		  "commOutputIds" : {
			"type" : "keyword"
		  },
		  "commOutputIdsNb" : {
			"type" : "long"
		  },
		  "communication_code" : {
			"type" : "keyword"
		  },
		  "connection_id" : {
			"type" : "long"
		  },
		  "connection_requests" : {
			"type" : "long"
		  },
		  "creationDate" : {
			"type" : "date"
		  },
		  "customer_id" : {
			"type" : "long"
		  },
		  "ehub_errors" : {
			"type" : "keyword"
		  },
		  "ehub_status" : {
			"type" : "keyword"
		  },
		  "ehub_vector_status" : {
			"type" : "keyword"
		  },
		  "errors" : {
			"type" : "keyword"
		  },
		  "eventDate" : {
			"type" : "date"
		  },
		  "eventStatus" : {
			"type" : "keyword"
		  },
		  "facility" : {
			"type" : "keyword"
		  },
		  "from_gelf" : {
			"type" : "keyword"
		  },
		  "full_message" : {
			"type" : "text",
			"analyzer" : "standard"
		  },
		  "gl2_remote_ip" : {
			"type" : "keyword"
		  },
		  "gl2_remote_port" : {
			"type" : "keyword"
		  },
		  "gl2_source_input" : {
			"type" : "keyword"
		  },
		  "gl2_source_node" : {
			"type" : "keyword"
		  },
		  "http_referer" : {
			"type" : "keyword"
		  },
		  "http_user_agent" : {
			"type" : "keyword"
		  },
		  "http_version" : {
			"type" : "keyword"
		  },
		  "idCom" : {
			"type" : "keyword"
		  },
		  "idEvt" : {
			"type" : "keyword"
		  },
		  "level" : {
			"type" : "long"
		  },
		  "message" : {
			"type" : "text",
			"analyzer" : "standard"
		  },
		  "millis" : {
			"type" : "float"
		  },
		  "personId" : {
			"type" : "long"
		  },
		  "postman_errors" : {
			"type" : "keyword"
		  },
		  "postman_order_status" : {
			"type" : "keyword"
		  },
		  "postman_output_date" : {
			"type" : "date"
		  },
		  "postman_status" : {
			"type" : "keyword"
		  },
		  "remote_addr" : {
			"type" : "keyword"
		  },
		  "remote_user" : {
			"type" : "keyword"
		  },
		  "request_path" : {
			"type" : "keyword"
		  },
		  "request_verb" : {
			"type" : "keyword"
		  },
		  "response_bytes" : {
			"type" : "long"
		  },
		  "response_status" : {
			"type" : "long"
		  },
		  "sendWay" : {
			"type" : "keyword"
		  },
		  "source" : {
			"type" : "text",
			"analyzer" : "analyzer_keyword",
			"fielddata" : true
		  },
		  "streams" : {
			"type" : "keyword"
		  },
		  "sysSourceId" : {
			"type" : "keyword"
		  },
		  "systeme_envoi" : {
			"type" : "keyword"
		  },
		  "timestamp" : {
			"type" : "date",
			"format" : "yyyy-MM-dd HH:mm:ss.SSS"
		  },
		  "vector" : {
			"type" : "keyword"
		  }
		}
	  }
	}
  }
}

curl -X GET http://ip:9200/graylog_5/_settings?pretty=true

returns:

{
  "graylog_5" : {
	"settings" : {
	  "index" : {
		"number_of_shards" : "4",
		"blocks" : {
		  "write" : "true",
		  "metadata" : "false",
		  "read" : "false"
		},
		"provided_name" : "graylog_5",
		"creation_date" : "1538978509168",
		"analysis" : {
		  "analyzer" : {
			"analyzer_keyword" : {
			  "filter" : "lowercase",
			  "tokenizer" : "keyword"
			}
		  }
		},
		"number_of_replicas" : "1",
		"uuid" : "3b3AkldoS3muym2BzRle0A",
		"version" : {
		  "created" : "5060399"
		}
	  }
	}
  }
}

Did you update the version or what does "used to" mean in this context?

I did not update ElasticSearch version.

I was using this request to get the maximum creationDate of the documents having source:spectre_supEvt :

curl -X GET http://ip:9200/graylog_5/_search?pretty=true -d '{"size": 0, "aggs": {"spectre": {"aggs": {"max_creationDate": {"max": {"field": "creationDate"}}}, "filter": {"term": {"source": "spectre_supEvt"}}}}}'

It used to work during one month, but stopped to work a week ago.

I'm now getting this response, with "value" : null :

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
	"total" : 88,
	"successful" : 88,
	"skipped" : 0,
	"failed" : 0
  },
  "hits" : {
	"total" : 401382186,
	"max_score" : 0.0,
	"hits" : [ ]
  },
  "aggregations" : {
	"spectre" : {
	  "doc_count" : 0,
	  "max_creationDate" : {
		"value" : null
	  }
	}
  }
}

To be precise, I was using a request on a set of indices, but the principle is the same :

curl -X GET http://ip:9200/graylog_9,graylog_10,graylog_2,graylog_16,graylog_15,graylog_21,graylog_0,graylog_14,graylog_20,graylog_17,graylog_5,graylog_4,graylog_3,graylog_6,graylog_11,graylog_13,graylog_7,graylog_18,graylog_12,graylog_1,graylog_19,graylog_8/_search?pretty=true -d '{"size": 0, "aggs": {"spectre": {"aggs": {"max_creationDate": {"max": {"field": "creationDate"}}}, "filter": {"term": {"source": "spectre_supEvt"}}}}}'

These set of indices you use in the reuqest, are those time-based indices where some get added frequently? Which of the indices have been added since the behaviour change you observed? What is their mapping for the field in question (the "source" field)?
I'm asking this because sometimes involuntary mapping changes (e.g. changes in templates, dynamic mappings etc...) in one of multiple indices cause the rest to behave differently. Not sure this is the case, but since the change in behaviour seems to have started without any other changes to the system, this might be a possible explanation to narrow down on.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.