Hello!
I have the following JSON file and I send it to an Amazon Elasticsearch Service using FIlebeat, but Elasticsearch isn't capable to index it properly.
{
"scan": {
"id_scan": "2019-01-22 19:00:35",
"files": [
{"filename": "data_jan/data.txt",
"findings": [
{"quote": "Aurora Ramírez", "info_type": "PERSON_NAME", "likelihood": "4"},
{"quote": "Aurora", "info_type": "FIRST_NAME", "likelihood": "4"},
{"quote": "Ramírez", "info_type": "LAST_NAME", "likelihood": "4"},
{"quote": "Aurora Ramírez", "info_type": "FEMALE_NAME", "likelihood": "4"},
{"quote": "+34 629811498", "info_type": "PHONE_NUMBER", "likelihood": "3"},
{"quote": "48027218K", "info_type": "SPAIN_NIF_NUMBER", "likelihood": "4"},
{"quote": "Joan Maragall 11B", "info_type": "PERSON_NAME", "likelihood": "4"},
{"quote": "Joan", "info_type": "FIRST_NAME", "likelihood": "4"},
{"quote": "Joan Maragall 11B", "info_type": "FEMALE_NAME", "likelihood": "4"},
{"quote": "Carrer Joan Maragall 11B, Barcelona, Spain", "info_type": "LOCATION", "likelihood": "3"},
{"quote": "192.0.13.1", "info_type": "IP_ADDRESS", "likelihood": "4"},
{"quote": "27-10-2018", "info_type": "DATE", "likelihood": "4"},
{"quote": "27/10/18", "info_type": "DATE_OF_BIRTH", "likelihood": "5"},
{"quote": "04/03/1967", "info_type": "DATE_OF_BIRTH", "likelihood": "5"},
{"quote": "robertlangdon@security.com", "info_type": "EMAIL_ADDRESS", "likelihood": "4"},
{"quote": "security.com", "info_type": "DOMAIN_NAME", "likelihood": "4"},
{"quote": "Male", "info_type": "GENDER", "likelihood": "3"},
{"quote": "ca", "info_type": "LOCATION", "likelihood": "3"},
{"quote": "10:00pm", "info_type": "TIME", "likelihood": "4"},
{"quote": "15:34:04", "info_type": "TIME", "likelihood": "4"},
{"quote": "https://www.hello.com", "info_type": "URL", "likelihood": "4"},
{"quote": "www.hello.com", "info_type": "DOMAIN_NAME", "likelihood": "4"}
]
}
]
}
}
This is the configuration of multilines in filebeat.yml, with input type as log :
### Multiline options
# Multiline can be used for log messages spanning multiple lines. This is common
# for Java Stack Traces or C-Line Continuation
# The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
#multiline.pattern: ^\[
# Defines if the pattern set under pattern should be negated or not. Default is false.
#multiline.negate: false
# Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
# that was (not) matched before or after or as long as a pattern is not matched based on negate.
# Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
#multiline.match: after
json.keys_under_root: true
json.add_error_key: true
Kibana only recognises the fields "quote", "info_type" and "likelihood" as '?' fields, not as keywords. The other fields are not even indexed. I think that's because Filebeat only works with one object per line and doesn't allow objects within objects.
I've tried to manually specify the fields, so that Kibana knows the fields and the template:
GET filebeat-6.5.4-2019.01.16/_mapping/field/scan*
Output:
{
"filebeat-6.5.4-2019.01.16": {
"mappings": {
"doc": {
"scan.id_scan": {
"full_name": "scan.id_scan",
"mapping": {
"id_scan": {
"type": "date"
}
}
},
"scan.files.findings.likelihood": {
"full_name": "scan.files.findings.likelihood",
"mapping": {
"likelihood": {
"type": "keyword",
"ignore_above": 1024
}
}
},
"scan.files.filename": {
"full_name": "scan.files.filename",
"mapping": {
"filename": {
"type": "keyword",
"ignore_above": 1024
}
}
},
"scan.files.findings.quote": {
"full_name": "scan.files.findings.quote",
"mapping": {
"quote": {
"type": "keyword",
"ignore_above": 1024
}
}
},
"scan.files.findings.info_type": {
"full_name": "scan.files.findings.info_type",
"mapping": {
"info_type": {
"type": "keyword",
"ignore_above": 1024
}
}
}
}
}
}
}
Any idea of what am I doing wrong? Is it even possible to achieve with or without Logstash?
Many thanks in advance!