Setting Up Logstash In Docker-Compose For Bulk Ingest Of CSV Files In Local Machine

Ethan777100 · November 21, 2023, 9:44am

I managed to write requests to the console

Could hence create more Indexes, with the defined mapping and pull off stuff like that from just using the console.

Earlier on, I also managed to compress the /.vhdx disk using the wslcompact utility.

Ethan777100 · November 21, 2023, 1:31pm

I receive automatic stoppage of my ingestion pipeline despite logstash container still running, can cmd prompt it and no Error messages raised. I only realise the stoppage when the number of documents in the index fails to increase further (despite it not reaching end of the month yet).

Timezone is now a funny issue i would like to understand better.

In order to get @timestamp right, I had to In my pipeling config.

    "date": {
        "field": "sourcetime",
        "timezone": "America/Los_Angeles",
        "formats": [
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"

Which then gives me @timestamp in accordance to the correct and original sourcetime.

But, the sourcetime itself is now 8 hours ahead.

Why can't @timestamp and sourcetime be matching?

stephenb · November 21, 2023, 2:31pm

Please just post the first 10 Lines of one of your CSV files including the header line.

When I get to it, I'll provide you my solution. May not get to that until the next couple days.

I will tell you trying to parse it in both logstash and ingest pipeline is probably not going to work well for you.

You can make the timestamps out come out however you want. Just remember they're stored in UTC and then displayed in your local time zone

After this @Ethan777100 I think you're going to need to work on this yourself. I have to move on to other topics

I think if you're just slowed down and read the documents first you would have had a much better success, need to understand the concepts that we. Have discussed

Ethan777100 · November 21, 2023, 2:35pm

Actually, I do think my understanding has improved over time. Pushing thru so much on an active thread to trial/error so much things was the most innovative process ever within a short, intense timespan.

Now I'm finding myself setting and forgetting the ingesting process because the past week was spent learning about best practices and peripheral context/troubleshoots that documentation doesn't necessarily tell you.

I could in theory just read documentation but fail to figure out anything or just absorb only 5% because I learn from pictorial and visual relevance specific to my use case.

Definitely I may encounter new issues moving down the road. I'll see how i can address them.

Your help has really gone a long way. Appreciate it.

id	uniqueid	alarm	eventtype	system	subsystem	sourcetime	operator	alarmvalue	value	equipment	location	severity	description	state	mmsstate	zone	graphicelement
7133402	unique1	alarm1	event1	system1	subsystem1	00:00.3	null	0	>=1 OPEN	equipment1	location1	0	description1	5	0	-1	-1
7133403	unique2	alarm2	event1	system1	subsystem1	00:00.3	null	0	NOT CLOSED & LOC	equipment2	location1	0	description2	5	0	-1	-1
7133404	unique3	alarm3	event1	system1	subsystem1	00:00.5	null	0	CLOSED & LOCKED	equipment3	location1	0	description3	5	0	-1	-1
7133405	unique4	alarm4	event1	system1	subsystem1	00:02.1	null	0	NO DOORS OPEN	equipment4	location1	0	description4	5	0	-1	-1
7133406	unique5	alarm5	event1	system1	subsystem1	00:02.1	null	0	CLOSED & LOCKED	equipment5	location1	0	description5	5	0	-1	-1
7133407	unique6	alarm6	event1	system2	subsystem2	00:02.3	TCO	0		equipment6	location2	0	description6	5	0	-1	-1
7133408	unique7	alarm7	event1	system2	subsystem2	00:02.3	TCO	0		equipment7	location2	0	description7	5	0	2	-1
7133409	unique8	alarm8	event1	system2	subsystem2	00:03.6	null	0		equipment8	location2	0	description8	5	0	-1	-1
7133410	unique9	alarm9	event1	system2	subsystem2	00:04.5	TCO	0		equipment9	location2	0	description9	5	0	-1	-1

stephenb · November 21, 2023, 2:56pm

do you want the source time and timestamp to be the same?

Explain to me how this is the source time

sourcetime
00:00.3

and you date processor looks like

        "field": "sourcetime",
        "timezone": "America/Los_Angeles",
        "formats": [
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"

and why your csv definition say separated by comma , and the example above is not

If I was not clear I want the first 10 rows of the RAW csv

I did not say you have no understanding... anyways.... give a sample of the raw data I will take a look later...

BTW source time and timestamp can be the same you just need to treat them the same and add the timezone.

you can run source time through the same date processor logic but set target_

"target_field" : "sourcetime"

Ethan777100 · November 21, 2023, 3:05pm

Date Format is a little peculiar for the csv files. Its populated in this manner. File Upload happened to detect several formats of the date within 1 day's worth of data.

For the date processor, "timezone": "Asia/Singapore" (UTC+8), but I changed it to "timezone": "America/Los_Angeles" (UTC-8), because I realised Elastic will +8 hours to each @timestamp and sourcetime.

Since I noticed @timestamp is the basis for visualisations esp for TSVB, I realised my timezone setting had to be UTC-8 in order to negate the offset.

Using timezone": "Asia/Singapore" (UTC+8)

@timestamp: 08:00:00
sourcetime: 16:00:00

Using timezone": "America/Los_Angeles, could give me the sourcetime original value parsed into @timestamp.

@timestamp: 00:00:00 
sourcetime: 08:00:00

stephenb · November 21, 2023, 3:08pm

Copying out of the Data Grid in the Excel is not the raw CSV

Open the file in notepad and copy the first 10 lines and post, that is the raw data, that is what you need to understand to load the data ... not what shows in Excell grid

If there is no yyyy-MM-dd the day will be assumed to be today... s that what you want?

Above you show June 07 2023... , Then you show 1/6/2023

I would prefer if you are consistent with your posts and data...

Don't worry about the timezones... easy fix

Exactly which timezone is correct for the Data?
And do you want the timestamp equal to the source Time..
AND if your Kiabana will always show in Local Time Zone... unless you manually set it.

If you get me raw data. I will help... otherwise I can not...

Ethan777100 · November 21, 2023, 3:15pm

Now I understand. Sry about that. I provided 01-06-2023 over here.

Whether 1/6 or 7/6, the data is about the same format. No worries.

id,uniqueid,alarm,eventtype,system,subsystem,sourcetime,operator,alarmvalue,value,equipment,location,severity,description,state,mmsstate,zone,graphicelement
3432479,CCC_3432479,CCC_0,DIAG_IAllDoorModeStatus[2],RRR,DDDD,2023-06-01 00:00:00.035,null,0,NO DOORS OPEN,xxx800/RRR/DDD/ALL,xxx,0,Summary of,5,0,-1,-1
3432480,CCC_3432480,CCC_0,DIAG_IAllDoorModeStatus[2],RRR,DDD,2023-06-01 00:00:00.439,null,0,NO DOORS OPEN,xx030/RRR/DDD/ALL,xx030,0,Summary of  Status with Open,5,0,-1,-1
3432481,CCC_3432481,CCC_0,DIAG_IAllDoorModeStatus[1],RRR,DDD,2023-06-01 00:00:04.108,null,0,CLOSED & LOCKED,xx140/RRR/DDD/ALL,xx140,0,Summary ofStatus with Closed & Locked,5,0,-1,-1
3432482,CCC_3432482,CCC_0,DIAG_IHndovr1Radio,SSS,ATC,2023-06-01 00:00:04.614,null,0,ALARM,xx022/SSS/ATC/AAA_SYS,xx022,0,Auto Norm: performed handover with one radio only,5,0,-1,-1
3432483,CCC_3432483,CCC_0,COMMAND_RECEIVED_CCC,SSS,AAA,2023-06-01 00:00:04.923,null,0,,GGG/SSS/AAA/CCCAAA_SVR,GGG,0,Command SET ROUTE Received Status with Closed & Locked,5,0,-1,-1
3432485,CCC_3432485,CCC_0,DIAG_IAllDoorModeStatus[1],RRR,DDD,2023-06-01 00:00:06.143,null,0,CLOSED & LOCKED,xx030/RRR/DDD/ALL,xx030,0,Summary of Status with Closed & Locked,5,0,-1,-1
3432486,CCC_3432486,CCC_0,DIAG_IAllDoorModeStatus[2],RRR,DDD,2023-06-01 00:00:06.642,null,0,>=1 OPEN,xx210/RRR/DDD/ALL,xx210,0,Summary of Status with Open,5,0,-1,-1
3432487,CCC_3432487,CCC_0,DIAG_IAllDoorModeStatus[1],RRR,DDD,2023-06-01 00:00:06.644,null,0,NOT CLOSED & LOC,xx210/RRR/DDD/ALL,xx210,0,Summary of Status with Closed & Locked,5,0,-1,-1

Exactly which timezone is correct for the Data?

Asia/Singapore

And do you want the timestamp equal to the source Time..

Preferably yes

AND if your Kiabana will always show in Local Time Zone... unless you manually set it.

I dont think I did. I just dug my Settings. Didnt know u could do that actually.

stephenb · November 21, 2023, 3:24pm

Ok THATTSSSS more like it .... I will try later tonight or tomorrow.... I should have just asked for this in the beginning

stephenb · November 22, 2023, 2:56am

Here is my code that works.
This works from my directories etc, you will need to adjust the paths in the logstash.conf
I made some subtle changes in the mappings and ingest pipeline for the dates... see if you can spot them

Also you need a carriage return after the last line in the csv or the last line will not be read

You will get 1 parsing error per file as it reads the header you should just ignore that.

You can run these commands
This is for the template which will set the mappings for indices that match the pattern
And the ingest pipeline.

DELETE _index_template/ats-event-template

PUT _index_template/ats-event-template
{
  "index_patterns": [
    "ats-events-*"
  ],
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss.SS||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.S"
        },
        "alarm": {
          "type": "keyword"
        },
        "alarmvalue": {
          "type": "long"
        },
        "description": {
          "type": "text"
        },
        "equipment": {
          "type": "keyword"
        },
        "eventtype": {
          "type": "keyword"
        },
        "graphicelement": {
          "type": "long"
        },
        "id": {
          "type": "long"
        },
        "location": {
          "type": "keyword"
        },
        "mmsstate": {
          "type": "long"
        },
        "operator": {
          "type": "keyword"
        },
        "severity": {
          "type": "long"
        },
        "sourcetime": {
          "type": "date",
          "format": "strict_date_optional_time||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss.SS||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.S"
        },
        "state": {
          "type": "long"
        },
        "subsystem": {
          "type": "keyword"
        },
        "system": {
          "type": "keyword"
        },
        "uniqueid": {
          "type": "keyword"
        },
        "value": {
          "type": "keyword"
        },
        "zone": {
          "type": "long"
        }
      }
    }
  }
}


DELETE _ingest/pipeline/ats-events-pipeline

PUT _ingest/pipeline/ats-events-pipeline
{
  "description": "Ingest pipeline created by text structure finder",
  "processors": [
    {
      "csv": {
        "field": "message",
        "target_fields": [
          "id",
          "uniqueid",
          "alarm",
          "eventtype",
          "system",
          "subsystem",
          "sourcetime",
          "operator",
          "alarmvalue",
          "value",
          "equipment",
          "location",
          "severity",
          "description",
          "state",
          "mmsstate",
          "zone",
          "graphicelement"
        ],
        "ignore_missing": false
      }
    },
    {
      "date": {
        "field": "sourcetime",
        "timezone": "Asia/Singapore",
        "formats": [
          "strict_date_optional_time",
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"
        ]
      }
    },
    {
      "date": {
        "field": "sourcetime",
        "timezone": "Asia/Singapore",
        "target_field": "sourcetime", 
        "formats": [
          "strict_date_optional_time",
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"
        ]
      }
    },
    {
      "convert": {
        "field": "alarmvalue",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "graphicelement",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "id",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "mmsstate",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "severity",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "state",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "convert": {
        "field": "zone",
        "type": "long",
        "ignore_missing": true
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

The logstash.conf note NO filter and does have the pipeline

input { 
    file { 
        path => "/usr/share/logstash/csv_files/events2023-06-01.csv"
        start_position => "beginning" 
        sincedb_path => "/dev/null"
    } 
}

output { 
    elasticsearch {
        index => "ats-events-2023-06" 
        hosts => ["https://es01:9200"]
        user => "elastic"
        password => "mypassword"
        pipeline => "ats-events-pipeline"
        ssl_verification_mode=> "none"
    }
    # stdout{ codec => "rubydebug"} 
}

Then this is the results with the data you provide (plus carriage return at the end)

GET ats-events-2023-06

GET ats-events-2023-06/_search

# Results

# GET ats-events-2023-06 200 OK
{
  "ats-events-2023-06": {
    "aliases": {},
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "strict_date_optional_time||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss.SS||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.S"
        },
        "@version": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "alarm": {
          "type": "keyword"
        },
        "alarmvalue": {
          "type": "long"
        },
        "description": {
          "type": "text"
        },
        "equipment": {
          "type": "keyword"
        },
        "event": {
          "properties": {
            "original": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "eventtype": {
          "type": "keyword"
        },
        "graphicelement": {
          "type": "long"
        },
        "host": {
          "properties": {
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "id": {
          "type": "long"
        },
        "location": {
          "type": "keyword"
        },
        "log": {
          "properties": {
            "file": {
              "properties": {
                "path": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            }
          }
        },
        "mmsstate": {
          "type": "long"
        },
        "operator": {
          "type": "keyword"
        },
        "severity": {
          "type": "long"
        },
        "sourcetime": {
          "type": "date",
          "format": "strict_date_optional_time||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd HH:mm:ss.SS||yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.S"
        },
        "state": {
          "type": "long"
        },
        "subsystem": {
          "type": "keyword"
        },
        "system": {
          "type": "keyword"
        },
        "uniqueid": {
          "type": "keyword"
        },
        "value": {
          "type": "keyword"
        },
        "zone": {
          "type": "long"
        }
      }
    },
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        },
        "number_of_shards": "1",
        "provided_name": "ats-events-2023-06",
        "creation_date": "1700621578430",
        "number_of_replicas": "1",
        "uuid": "PPJNj8bTR4SevVlbYY23JQ",
        "version": {
          "created": "8500003"
        }
      }
    }
  }
}
# GET ats-events-2023-06/_search 200 OK
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 8,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "ats-events-2023-06",
        "_id": "V-7y9IsBNa_axXWiVeWb",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:06.644+08:00",
          "subsystem": "DDD",
          "equipment": "xx210/RRR/DDD/ALL",
          "description": "Summary of Status with Closed & Locked",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:06.644+08:00",
          "system": "RRR",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "DIAG_IAllDoorModeStatus[1]",
          "location": "xx210",
          "id": 3432487,
          "state": 5,
          "event": {
            "original": "3432487,CCC_3432487,CCC_0,DIAG_IAllDoorModeStatus[1],RRR,DDD,2023-06-01 00:00:06.644,null,0,NOT CLOSED & LOC,xx210/RRR/DDD/ALL,xx210,0,Summary of Status with Closed & Locked,5,0,-1,-1"
          },
          "value": "NOT CLOSED & LOC",
          "uniqueid": "CCC_3432487"
        }
      },
      {
        "_index": "ats-events-2023-06",
        "_id": "UO7x9IsBNa_axXWi9OVZ",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:04.923+08:00",
          "subsystem": "AAA",
          "equipment": "GGG/SSS/AAA/CCCAAA_SVR",
          "description": "Command SET ROUTE Received Status with Closed & Locked",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:04.923+08:00",
          "system": "SSS",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "COMMAND_RECEIVED_CCC",
          "location": "GGG",
          "id": 3432483,
          "state": 5,
          "event": {
            "original": "3432483,CCC_3432483,CCC_0,COMMAND_RECEIVED_CCC,SSS,AAA,2023-06-01 00:00:04.923,null,0,,GGG/SSS/AAA/CCCAAA_SVR,GGG,0,Command SET ROUTE Received Status with Closed & Locked,5,0,-1,-1"
          },
          "uniqueid": "CCC_3432483"
        }
      },
      {
        "_index": "ats-events-2023-06",
        "_id": "Ue7x9IsBNa_axXWi9OVY",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:06.642+08:00",
          "subsystem": "DDD",
          "equipment": "xx210/RRR/DDD/ALL",
          "description": "Summary of Status with Open",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:06.642+08:00",
          "system": "RRR",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "DIAG_IAllDoorModeStatus[2]",
          "location": "xx210",
          "id": 3432486,
          "state": 5,
          "event": {
            "original": "3432486,CCC_3432486,CCC_0,DIAG_IAllDoorModeStatus[2],RRR,DDD,2023-06-01 00:00:06.642,null,0,>=1 OPEN,xx210/RRR/DDD/ALL,xx210,0,Summary of Status with Open,5,0,-1,-1"
          },
          "value": ">=1 OPEN",
          "uniqueid": "CCC_3432486"
        }
      },
      {
        "_index": "ats-events-2023-06",
        "_id": "Vu7x9IsBNa_axXWi9OVZ",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:04.108+08:00",
          "subsystem": "DDD",
          "equipment": "xx140/RRR/DDD/ALL",
          "description": "Summary ofStatus with Closed & Locked",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:04.108+08:00",
          "system": "RRR",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "DIAG_IAllDoorModeStatus[1]",
          "location": "xx140",
          "id": 3432481,
          "state": 5,
          "event": {
            "original": "3432481,CCC_3432481,CCC_0,DIAG_IAllDoorModeStatus[1],RRR,DDD,2023-06-01 00:00:04.108,null,0,CLOSED & LOCKED,xx140/RRR/DDD/ALL,xx140,0,Summary ofStatus with Closed & Locked,5,0,-1,-1"
          },
          "value": "CLOSED & LOCKED",
          "uniqueid": "CCC_3432481"
        }
      },
      {
        "_index": "ats-events-2023-06",
        "_id": "Uu7x9IsBNa_axXWi9OVZ",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:06.143+08:00",
          "subsystem": "DDD",
          "equipment": "xx030/RRR/DDD/ALL",
          "description": "Summary of Status with Closed & Locked",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:06.143+08:00",
          "system": "RRR",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "DIAG_IAllDoorModeStatus[1]",
          "location": "xx030",
          "id": 3432485,
          "state": 5,
          "event": {
            "original": "3432485,CCC_3432485,CCC_0,DIAG_IAllDoorModeStatus[1],RRR,DDD,2023-06-01 00:00:06.143,null,0,CLOSED & LOCKED,xx030/RRR/DDD/ALL,xx030,0,Summary of Status with Closed & Locked,5,0,-1,-1"
          },
          "value": "CLOSED & LOCKED",
          "uniqueid": "CCC_3432485"
        }
      },
      {
        "_index": "ats-events-2023-06",
        "_id": "Ve7x9IsBNa_axXWi9OVZ",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:04.614+08:00",
          "subsystem": "ATC",
          "equipment": "xx022/SSS/ATC/AAA_SYS",
          "description": "Auto Norm: performed handover with one radio only",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:04.614+08:00",
          "system": "SSS",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "DIAG_IHndovr1Radio",
          "location": "xx022",
          "id": 3432482,
          "state": 5,
          "event": {
            "original": "3432482,CCC_3432482,CCC_0,DIAG_IHndovr1Radio,SSS,ATC,2023-06-01 00:00:04.614,null,0,ALARM,xx022/SSS/ATC/AAA_SYS,xx022,0,Auto Norm: performed handover with one radio only,5,0,-1,-1"
          },
          "value": "ALARM",
          "uniqueid": "CCC_3432482"
        }
      },
      {
        "_index": "ats-events-2023-06",
        "_id": "U-7x9IsBNa_axXWi9OVZ",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:00.439+08:00",
          "subsystem": "DDD",
          "equipment": "xx030/RRR/DDD/ALL",
          "description": "Summary of  Status with Open",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:00.439+08:00",
          "system": "RRR",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "DIAG_IAllDoorModeStatus[2]",
          "location": "xx030",
          "id": 3432480,
          "state": 5,
          "event": {
            "original": "3432480,CCC_3432480,CCC_0,DIAG_IAllDoorModeStatus[2],RRR,DDD,2023-06-01 00:00:00.439,null,0,NO DOORS OPEN,xx030/RRR/DDD/ALL,xx030,0,Summary of  Status with Open,5,0,-1,-1"
          },
          "value": "NO DOORS OPEN",
          "uniqueid": "CCC_3432480"
        }
      },
      {
        "_index": "ats-events-2023-06",
        "_id": "VO7x9IsBNa_axXWi9OVZ",
        "_score": 1,
        "_source": {
          "severity": 0,
          "log": {
            "file": {
              "path": "/usr/share/logstash/csv_files/events2023-06-01.csv"
            }
          },
          "graphicelement": -1,
          "sourcetime": "2023-06-01T00:00:00.035+08:00",
          "subsystem": "DDDD",
          "equipment": "xxx800/RRR/DDD/ALL",
          "description": "Summary of",
          "operator": "null",
          "mmsstate": 0,
          "@timestamp": "2023-06-01T00:00:00.035+08:00",
          "system": "RRR",
          "zone": -1,
          "alarmvalue": 0,
          "@version": "1",
          "host": {
            "name": "6e71e0dd4966"
          },
          "alarm": "CCC_0",
          "eventtype": "DIAG_IAllDoorModeStatus[2]",
          "location": "xxx",
          "id": 3432479,
          "state": 5,
          "event": {
            "original": "3432479,CCC_3432479,CCC_0,DIAG_IAllDoorModeStatus[2],RRR,DDDD,2023-06-01 00:00:00.035,null,0,NO DOORS OPEN,xxx800/RRR/DDD/ALL,xxx,0,Summary of,5,0,-1,-1"
          },
          "value": "NO DOORS OPEN",
          "uniqueid": "CCC_3432479"
        }
      }
    ]
  }
}

Ethan777100 · November 22, 2023, 4:49am

Studying your code now.

I see the difference. One json has target_field and the other doesnt.

      "date": {
        "field": "sourcetime",
        "timezone": "Asia/Singapore",
        "formats": [
          "strict_date_optional_time",
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"
        ]
      }
    },
    {
      "date": {
        "field": "sourcetime",
        "timezone": "Asia/Singapore",
        "target_field": "sourcetime", 
        "formats": [
          "strict_date_optional_time",
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"
        ]

stephenb · November 22, 2023, 4:50am

Look closely they are not duplicates.... Read the docs...

Right so no target_field defaults to @timestamp...
Or you can explicitly set it so it clear...

Ethan777100 · November 22, 2023, 5:13am

Name	Required	Default	Description
`field`	yes	-	The field to get the date from.
`target_field`	no	@timestamp	The field that will hold the parsed date.

I saw this under Date processor | Elasticsearch Guide [8.11] | Elastic

Broadly, I saw having 2 of the same objects as a means to offset the Time Difference? ITs an interesting technique u did if I understood correctly.

stephenb · November 22, 2023, 6:10am

I use date processor to set the timezones for both sourcetime and the @timestamp ....what you asked for ... It's just code to do that ... If this is clearer

"date": {
        "field": "sourcetime",
        "target_field": "@timestamp", 
        "timezone": "Asia/Singapore",
        "formats": [
          "strict_date_optional_time",
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"
        ]
      }
    },
    {
      "date": {
        "field": "sourcetime",
        "timezone": "Asia/Singapore",
        "target_field": "sourcetime", 
        "formats": [
          "strict_date_optional_time",
          "yyyy-MM-dd HH:mm:ss.SSS",
          "yyyy-MM-dd HH:mm:ss.SS",
          "yyyy-MM-dd HH:mm:ss",
          "yyyy-MM-dd HH:mm:ss.S"
        ]

Don't overthink it......
processor are functions not objects...
I'm Just applying the code / functions

system · November 22, 2023, 8:27am

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

Ethan777100 · November 22, 2023, 9:19am

Made A New Topic

Changing Index Mapping & Making Long Texts Keyword For Elasticsearch - Elastic Stack / Elasticsearch - Discuss the Elastic Stack

Ethan777100 · November 22, 2023, 3:27pm

Sorry this is going to be a little stupid but now I get a parsing error in my csv files for an unknown reason.

Illegal character inside unquoted field at 197 ?

I was doing some data cleaning work on my files this evening. Ran my files into python utility scripts. However, they can still be opened normally.

Using Elastic's File Upload didnt raise any errors as well.

[2023-11-22T14:57:58,035][WARN ][logstash.outputs.elasticsearch][main][b657bd83de5d7d68e3f96ca1f75d1568be3cf9df7470bb420f842091e1681f72] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ats-mainline-logs-2023-01", :routing=>nil, :pipeline=>"ats-mainline-logs-pipeline"}, {"@timestamp"=>2023-11-22T14:57:56.352619127Z, "event"=>{"original"=>"54150475,CCC_54150475,CCC_0,DIAG_IAllDoorModeStatus[1],CSC,DRS,2023-01-01 00:31:44.733,null,0,NOT CLOSED & LOC,x0030/RSC/DRS/ALL,x0030,0,Summary of xxxxx Doors Status with Closed & Locked,5,0,-1,-1\r"}, "message"=>"54150475,CCC_54150475,CCC_0,DIAG_IAllDoorModeStatus[1],RRR,DRS,2023-01-01 00:31:44.733,null,0,NOT CLOSED & LOC,x00x0/RSC/DRS/ALL,900x0,0,Summary of xxxxx Doors Status with Closed & Locked,5,0,-1,-1\r", "host"=>{"name"=>"677b632d35c5"}, "@version"=>"1", "log"=>{"file"=>{"path"=>"/usr/share/logstash/ats-logs-mainline/2023/01-Jan-23/events2023-01-01.csv"}}}], :response=>{"index"=>{"status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Illegal character inside unquoted field at 197"}}}}

My own logstash pipeline now broken. Im not sure what I broke

Managed to capture a console entry

{
    "@timestamp" => 2023-11-22T15:43:21.000370288Z,
           "log" => {
        "file" => {
            "path" => "/usr/share/logstash/ats-logs-mainline/2023/01-Jan-23/events2023-01-01.csv"
        }
    },
      "@version" => "1",
       "message" => "54146161,OCC_54146161,OCC_0,DIAG_IAllDoorModeStatus[2],RSC,DRS,2023-01-01 00:00:34.672,null,0,>=1 OPEN,90390/RSC/DRS/ALL,90390,0,Summary of Train Doors Status with Open,5,0,-1,-1\r",
          "host" => {
        "name" => "41d44a8cb286"
    },
         "event" => {
        "original" => "54146161,OCC_54146161,OCC_0,DIAG_IAllDoorModeStatus[2],RSC,DRS,2023-01-01 00:00:34.672,null,0,>=1 OPEN,90390/RSC/DRS/ALL,90390,0,Summary of Train Doors Status with Open,5,0,-1,-1\r"
    }
}

Is the \r" causing this?

UPDATE: This still occurs even if I piped in my original csv files. Hence the issue is not related to my data cleaning work...

Everything just looks so flawless and correct i dont even know what's wrong.

This why I'm scared to break code and try new things into unchartered

stephenb · November 22, 2023, 6:31pm

Ethan777100:

       "message" => "54146161,OCC_54146161,OCC_0,DIAG_IAllDoorModeStatus[2],RSC,DRS,2023-01-01 00:00:34.672,null,0,>=1 OPEN,90390/RSC/DRS/ALL,90390,0,Summary of Train Doors Status with Open,5,0,-1,-1\r",

Yup you can easily see this with a _simulate

POST _ingest/pipeline/ats-events-pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "message": "54146161,OCC_54146161,OCC_0,DIAG_IAllDoorModeStatus[2],RSC,DRS,2023-01-01 00:00:34.672,null,0,>=1 OPEN,90390/RSC/DRS/ALL,90390,0,Summary of Train Doors Status with Open,5,0,-1,-1\r"
      }
    },
    {
      "_source": {
        "message": "54146161,OCC_54146161,OCC_0,DIAG_IAllDoorModeStatus[2],RSC,DRS,2023-01-01 00:00:34.672,null,0,>=1 OPEN,90390/RSC/DRS/ALL,90390,0,Summary of Train Doors Status with Open,5,0,-1,-1"
      }
    }
  ]
}

# results 1st one with \r second not

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "Illegal character inside unquoted field at 178"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "Illegal character inside unquoted field at 178"
      }
    },
    {
      "doc": {
        "_index": "_index",
        "_version": "-3",
        "_id": "_id",
        "_source": {
          "severity": 0,
          "graphicelement": -1,
          "sourcetime": "2023-01-01T00:00:34.672+08:00",
          "subsystem": "DRS",
          "equipment": "90390/RSC/DRS/ALL",
          "description": "Summary of Train Doors Status with Open",
          "operator": "null",
          "mmsstate": 0,
          "system": "RSC",
          "@timestamp": "2023-01-01T00:00:34.672+08:00",
          "zone": -1,
          "alarmvalue": 0,
          "alarm": "OCC_0",
          "eventtype": "DIAG_IAllDoorModeStatus[2]",
          "location": "90390",
          "id": 54146161,
          "state": 5,
          "value": ">=1 OPEN",
          "uniqueid": "OCC_54146161"
        },
        "_ingest": {
          "timestamp": "2023-11-22T18:30:41.586762844Z"
        }
      }
    }
  ]
}

Ethan777100 · November 22, 2023, 6:34pm

This is scary.

Its apparently still happening with my original (uncleaned) csv files that worked earlier today.

Let me try investigate further

stephenb · November 22, 2023, 6:37pm

I suspect you "Saved" your data with Excell or Notepad++ or something and it changed the carriage returns or something...

Man you are running into everything!!!

Topic		Replies	Views
Bulk Upload Csv Files Into Standalone ELK Stack In Docker Kibana docker	10	710	November 15, 2023
How to get Elastic, Kibana and Logstash working with docker compose Logstash docker	2	1023	July 14, 2020
How to set volume map for logstash.conf file in docker compose file Logstash docker	2	123	February 2, 2024
Is Logstash working? Logstash	5	366	May 21, 2018
3 Node cluster with docker compose Elasticsearch docker	1	2284	November 30, 2020

Setting Up Logstash In Docker-Compose For Bulk Ingest Of CSV Files In Local Machine

Related topics