Remote_addr showing 127.0.01

AkankshaSS · October 4, 2022, 6:40am

Hi,I am able to see only 127.0.01 in http-x_forwarded_for.Can someone please help me what's the changes required.

warkolm · October 4, 2022, 6:51am

Welcome to our community!

It's not really clear what you are referring to here, can you please add some more information.

AkankshaSS · October 6, 2022, 5:53pm

Hi,

There is a config file of logstash where the above-mentioned field is showing the local host IP address, I want to make a change in the config in such a way that it should not take the local IP address, it should show me some client IP or some valid IP where the request is coming from. Please find the below config.

input {
  tcp {
    codec => "json"
    port  => 5546
    type  => "concierge"
  }
  tcp {
    codec => "json"
    port  => 5547
    type  => "hagrid"
  }
}

filter {

  if [type] in ["concierge", "hagrid"] {

    # TODO Remove fix after concierge deployment in scope of BELL-4880 - OpsLogV1R1 : LogRotate and shipping the logs to Kibana.
    if [message] {
      mutate {
        add_field => { "rsyslog_message" => "%{message}" }
      }
    }

    # Concierge request format is constructed by DropWizard7/Jetty according to the logic: github.com/eclipse/jetty.project/blob/jetty-9.0.7.v20131107/jetty-server/src/main/java/org/eclipse/jetty/server/AbstractNCSARequestLog.java#L118:L138
    # http://stackoverflow.com/questions/29098340/meaning-of-each-field-in-default-format-of-http-request-log-in-dropwizard
    grok {
      match => ["rsyslog_message", '%{DATA:http_x_forwarded_for} %{SPACE}- %{SPACE}(%{USER:remote_user}|%{SPACE}-%{SPACE}) %{SPACE}\[%{HTTPDATE:log_timestamp}\] %{SPACE}"%{WORD:method} %{DATA:path} HTTP/%{NUMBER:http_version}" %{SPACE}%{NUMBER:status} %{SPACE}(?:%{NUMBER:bytes}|-) %{SPACE}"%{DATA:referrer}" %{SPACE}"%{DATA:agent}" %{SPACE}(?:%{NUMBER:request_time})']
      tag_on_failure => ["_grokparsefailure", "_grok_rsyslog_message"]
    }

    # No need to index Health checks and other stuff.
    if [path] =~ /(HOSTINFO\.txt|VERSION\.txt|\/manager\/jmxproxy\/|\/portal|\/favicon\.ico|\/robots\.txt|\/healthcheck|\/elbHealthcheck)/ {
      drop {}
    }

    # Decode the path URL.
    urldecode { field => "path" }

    grok {
      match => ["@source", "%{WORD:universe}-%{WORD:service}-%{WORD:instance_id}"]
      tag_on_failure => ["_grokparsefailure", "_grok_source"]
    }

    mutate {
      add_field => { "collector_id" => "<%= @hostname %>" }
      replace   => { "instance_id" => "i-%{instance_id}" }
    }

    # Override default @timestamp field (automatically created by logstash) with the parsed date value from
    # 'log_timestamp' field which contains the timestamp when a request had been processed by an application instance.
    date {
      match => ['log_timestamp', 'dd/MMM/yyyy:HH:mm:ss Z']
      target => "@timestamp"
    }

    # Create path copy field and parse it leaving path itself unmodified.
    mutate {
      add_field => { "path_copy" => "%{path}" }
    }

    # Lowercase 'path_copy' field for regex pattern matching.
    mutate {
      lowercase => [ "path_copy" ]
    }

    # Rename 'type' field value which serves for defining the application (Concierge or Hagrid) by the incoming port
    # and for separating opslog from requestlog.
    mutate {
      rename => { "type" => "application" }
    }


    # Process fields specific for api paths.
    if [path_copy] =~ /^\/data\// {

      # Store the endpoint (reviews, questsions, etc) and the format (xml or json).
      grok {
        match => ["path_copy", "/data/%{WORD:endpoint}(?:\.%{WORD:response_format})?%{DATA}"]
        tag_on_failure => ["_grokparsefailure", "_grok_path_copy"]
      }

      if [endpoint] == "batch" {
        # Remove everything between keyword name and "=" symbol to process keywords.
        mutate {
          gsub => [ "path_copy", "(\.[^&?=]*=)", "=" ]
        }
      }
      else {
        mutate {
          update => { "endpoint" => "%{endpoint}" }
        }
      }

      # Split parameters into key value pairs.
      # Ignore lowercase 'passkey' field in favor of the unmodified version.
      # Exclude certain fields from being overwritten by the fields with same names passed in 'path_copy'.
      # Note: keep this list updated along with whitelisted fields list.
      kv {
        source       => "path_copy"
        field_split  => "?&"
        exclude_keys => [
            'passkey',
            'agent',
            'application',
            'bytes',
            'cdn_addr',
            'collector_id',
            'endpoint',
            'filter_type',
            'geoip_city',
            'geoip_country',
            'geoip_location',
            'http_version',
            'http_x_forwarded_for',
            'instance_id',
            'limit_type',
            'log_timestamp',
            'method',
            'path',
            'region',
            'referrer',
            'remote_addr',
            'remote_user',
            'request_time',
            'response_format',
            'rsyslog_message',
            'search_type',
            'sort_type',
            '@source',
            'status',
            'tags',
            '@timestamp',
            'universe'
        ]
      }

      # Have separate 'offset' field to find clients that are "bulk scraping" (BELL-3613)
      if [offset] {
        ruby {
          code => "begin; if !event.get('offset').nil?;
                   event.set('offset', event.get('offset').join(',').split(',').reject(&:empty?).max);
                   end; rescue; end;"
        }
        mutate {
          convert   => { "offset" => "integer" }
          add_field => { "offset_max" => "%{offset}" }
        }
      }

      # Process Hagrid specific 'keyproperty' field, e.g. "...&keyproperty=syndication,unapproved&..."
      if [keyproperty] {
        mutate {
          split => { "keyproperty" => "," }
        }
      }

      # Add 'undescore' tag for requests that contain _ as a parameter.
      if [_] {
        mutate {
          add_tag => [ "underscore" ]
        }
      }

      # Rename 'apiversion' field to 'api_version' to be in sync with naming of the Apigee ELK
      mutate {
        rename => { "apiversion" => "api_version" }
      }

      # Extract case-sensitive passkey from an unmodified 'path' field into 'passkey' field.
      grok {
        match => ["path", "(?i:passkey\=)(?<passkey>[0-9a-zA-Z]+)"]
        tag_on_failure => []
      }

      # Get matching data for the passkey from the mapping file.
      translate {
        dictionary_path => "<%= @config_dir %>/mapping/apikey_mapping.yaml"
        field           => "passkey"            # The field to match.
        destination     => "key_data_field"     # The field to store raw text of json mapping data in.
      }

      # Process raw text as json with appropriate fields.
      json {
        source       => "key_data_field"                            # Text field with raw json data.
        target       => "icebreaker"                                # The upper level field to store json data in.
        add_field    => { "client" => "%{icebreaker[client]}" }     # For compatibility, store as 'client' field instead of 'icebreaker.client'.
        remove_field => [ "icebreaker[client]" ]
      }

      # The value of the clientname url param should be stored in the client field if all of the following are true:
      # 0. clientname exists
      # 1. a passkey was not specified
      # 2. the request was a hagrid request (application==hagrid)
      if [clientname] and not ([passkey]) and [application] == "hagrid" {
        mutate { add_field => { "client" => "%{clientname}" } }
      }

      # The following parsing rules rely on API documentation
      # https://developer.bazaarvoice.com/docs/read/conversations_api/getting_started/display_fundamentals
      # https://developer.bazaarvoice.com/docs/read/conversations
      # https://developer.bazaarvoice.com/docs/read/conversations_api/reference

      ##### Add filter types to 'filter_type' field and their whole values after '=' sign (attributes) to 'filter_attribute' field.
      if [filter] {
        mutate {
          merge => ["filter_attribute", "filter"] # Filter=Attribute1:operator:Value&Filter=Attribute2:operator:Value1,Value2,ValueN
        }
        ruby {
          code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter'))"
        }
      }
      if [filter_reviews] {
        mutate {
          merge => ["filter_attribute", "filter_reviews"]
        }
        ruby {
          code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_reviews'))"
        }
      }
      if [filter_answers] {
        mutate {
          merge => ["filter_attribute", "filter_answers"]
        }
        ruby {
          code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_answers'))"
        }
      }
      if [filter_comments] {
        mutate {
          merge => ["filter_attribute", "filter_comments"]
        }
        ruby {
          code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_comments'))"
        }
      }
      if [filter_products] {
        mutate {
           merge => ["filter_attribute", "filter_products"]
        }
        ruby {
           code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_products'))"
        }
      }
      if [filter_authors] {
        mutate {
           merge => ["filter_attribute", "filter_authors"]
        }
        ruby {
           code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_authors'))"
        }
      }
      if [filter_questions] {
        mutate {
           merge => ["filter_attribute", "filter_questions"]
        }
        ruby {
           code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_questions'))"
        }
      }
      if [filter_stories] {
        mutate {
           merge => ["filter_attribute", "filter_stories"]
        }
        ruby {
           code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_stories'))"
        }
      }
      if [filter_categories] {
        mutate {
           merge => ["filter_attribute", "filter_categories"]
        }
        ruby {
           code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_categories'))"
        }
      }
      if [filter_reviewcomments] {
        mutate {
           merge => ["filter_attribute", "filter_reviewcomments"]
        }
        ruby {
           code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_reviewcomments'))"
        }
      }
      if [filter_storycomments] {
        mutate {
           merge => ["filter_attribute", "filter_storycomments"]
        }
        ruby {
           code => "event.set('filter_type', Array(event.get('filter_type')).reject(&:empty?) + Array('filter_storycomments'))"
        }
      }


      ##### Add sort types to 'sort_type' field and their whole values after '=' sign (attributes) to 'sort_attribute' field.
      if [sort] {
        mutate {
          split => ["sort", ","]  # Sort=Attribute1:direction,Attribute2:direction
          merge => ["sort_attribute", "sort"]
        }
        ruby {
          code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort'))"
        }
      }
      if [sort_answers] {
        mutate {
           merge => ["sort_attribute", "sort_answers"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_answers'))"
        }
      }
      if [sort_reviews] {
        mutate {
           merge => ["sort_attribute", "sort_reviews"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_reviews'))"
        }
      }
      if [sort_comments] {
        mutate {
           merge => ["sort_attribute", "sort_comments"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_comments'))"
        }
      }
      if [sort_products] {
        mutate {
           merge => ["sort_attribute", "sort_products"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_products'))"
        }
      }
      if [sort_authors] {
        mutate {
           merge => ["sort_attribute", "sort_authors"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_authors'))"
        }
      }
      if [sort_questions] {
        mutate {
           merge => ["sort_attribute", "sort_questions"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_questions'))"
        }
      }
      if [sort_stories] {
        mutate {
           merge => ["sort_attribute", "sort_stories"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_stories'))"
        }
      }
      if [sort_categories] {
        mutate {
           merge => ["sort_attribute", "sort_categories"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_categories'))"
        }
      }
      if [sort_reviewcomments] {
        mutate {
           merge => ["sort_attribute", "sort_reviewcomments"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_reviewcomments'))"
        }
      }
      if [sort_storycomments] {
        mutate {
           merge => ["sort_attribute", "sort_storycomments"]
        }
        ruby {
           code => "event.set('sort_type', Array(event.get('sort_type')).reject(&:empty?) + Array('sort_storycomments'))"
        }
      }

      ##### Add search types to 'search_type' field and their whole values after '=' sign (attributes) to 'search_attribute' field.
      if [search] {
        mutate {
          merge => ["search_value", "search"]  # Full-text search string used to find UGC, e.g. Search="dish soap"
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search'))"
        }
      }
      if [search_answers] {
        mutate {
          merge => ["search_attribute", "search_answers"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_answers'))"
        }
      }
      if [search_reviews] {
        mutate {
          merge => ["search_attribute", "search_reviews"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_reviews'))"
        }
      }
      if [search_comments] {
        mutate {
          merge => ["search_attribute", "search_comments"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_comments'))"
        }
      }
      if [search_products] {
        mutate {
          merge => ["search_attribute", "search_products"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_products'))"
        }
      }
      if [search_authors] {
        mutate {
          merge => ["search_attribute", "search_authors"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_authors'))"
        }
      }
      if [search_questions] {
        mutate {
          merge => ["search_attribute", "search_questions"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_questions'))"
        }
      }
      if [search_stories] {
        mutate {
          merge => ["search_attribute", "search_stories"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_stories'))"
        }
      }
      if [search_categories] {
        mutate {
          merge => ["search_attribute", "search_categories"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_categories'))"
        }
      }
      if [search_reviewcomments] {
        mutate {
          merge => ["search_attribute", "search_reviewcomments"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_reviewcomments'))"
        }
      }
      if [search_storycomments] {
        mutate {
          merge => ["search_attribute", "search_storycomments"]
        }
        ruby {
          code => "event.set('search_type', Array(event.get('search_type')).reject(&:empty?) + Array('search_storycomments'))"
        }
      }

      ##### Add limit types to 'limit_type' field.
      if [limit] {
        ruby {
          code => "event.set('limit_type', Array(event.get('limit_type')).reject(&:empty?) + Array('limit'))"
        }
      }
      if [limit_answers] {
        ruby {
          code => "event.set('limit_type', Array(event.get('limit_type')).reject(&:empty?) + Array('limit_answers'))"
        }
      }
      if [limit_reviews] {
        ruby {
          code => "event.set('limit_type', Array(event.get('limit_type')).reject(&:empty?) + Array('limit_reviews'))"
        }
      }
      if [limit_comments] {
        ruby {
          code => "event.set('limit_type', Array(event.get('limit_type')).reject(&:empty?) + Array('limit_comments'))"
        }
      }
   ts', event.get('stats').join(',').split(',').uniq.reject(&:empty?));
                 end; rescue; end;"
      }
      ruby {
        code => "begin; if !event.get('filteredstats').nil?;
                 event.set('filteredstats', event.get('filteredstats').join(',').split(',').uniq.reject(&:empty?));
                 end; rescue; end;"
      }
    }

    # Exclude healthchecks from processing "http_x_forwarded_for" field and geoip matching.
    if !([app] in ["route53_healthcheck", "pingdom_healthcheck"]) {

      mutate {
        add_field => { "http_x_forwarded_for_copy" => "%{http_x_forwarded_for}" }
      }

      # Normalize http_x_forwarded_for string for geoip matching:
      # 1. Remove spaces.
      # 2. http_x_forwarded_for can sometimes contain 'unknown,' in "http_x_forwarded_for". Strip it as Logstash
      # does extra enormous logging ("IP Field contained invalid IP address or hostname").
      mutate {
        gsub => ["http_x_forwarded_for_copy", " ", ""]
        gsub => ["http_x_forwarded_for_copy", "unknown,+", ""]
      }

      if [application] == "hagrid" {
        grok {
          match => ["http_x_forwarded_for_copy", '%{IPORHOST:remote_addr},(?<balancer_addr>(%{IPORHOST},?)*)']
          tag_on_failure => ["_grokparsefailure", "_grok_http_x_forwarded_for_copy"]
        }
      }
      else {
        grok {
          match => ["http_x_forwarded_for_copy", '(?<remote_addr>(%{IPORHOST},)*)(?<cdn_addr>(%{IPORHOST},)){1,}%{IPORHOST:balancer_addr}{1,}']
          tag_on_failure => ["_grokparsefailure", "_grok_http_x_forwarded_for_copy"]
        }
      }

      # Split into array and remove commas.
      mutate {
        split => ["remote_addr", ","]
        split => ["cdn_addr",    ","]
      }

      # Use the first external IP in 'remote_addr' array to match its geo location. IPv6 addresses also work.
      if [remote_addr][0] !~ /(127.|10.|192.168|172.1[6-9].|172.2[0-9].|172.3[01].).*/ {
        geoip {
          source    => "[remote_addr][0]"
          target    => "geoip"
          database  => "<%= @config_dir %>/mapping/<%= @geoip_db_file %>"
          add_field => {
            "geoip_location" => [ "%{[geoip][longitude]}", "%{[geoip][latitude]}" ]
            "geoip_city"     => "%{[geoip][city_name]}"
            "geoip_country"  => "%{[geoip][country_name]}"
          }
        }

        mutate {
          convert => [ "geoip_location", "float"]
        }

        # Remove the field if a city is not matched and it gets pulluted by "%{[geoip][city_name]}" text.
        if not ([geoip][city_name]) or [geoip_city] == "%{[geoip][city_name]}" {
          mutate {remove_field => [ "geoip_city" ]}
        }
      }
    }

    # Leave message field in case a field is misparsed by grok.
    if "_grokparsefailure" not in [tags] {
      prune {remove_field => [ "rsyslog_message" ]}
    }

    # Leave only fields of interest.
    # Note: keep this list updated along with the excluded fields list.
    prune {
      whitelist_names => [
        '^agent$',
        '^api_version$',
        '^app$',
        '^appkey$',
        '^application$',
        '^bytes$',
        '^cdn_addr$',
        '^client$',
        '^collector_id$',
        '^displaycode$',
        '^endpoint$',
        '^filter_attribute$',
        '^filter_value$',
        '^filter_type$',
        '^filteredstats$',
        '^geoip_city$',
        '^geoip_country$',
        '^geoip_location$',
        '^http_version$',
        '^http_x_forwarded_for$',
        '^icebreaker',
        '^include$',
        '^instance_id$',
        '^keyproperty$',
        '^limit_type$',
        '^method$',
        '^offset_max$',
        '^passkey$',
        '^path$',
        '^region$',
        '^referrer$',
        '^remote_addr$',
        '^remote_user$',
        '^request_time$',
        '^resource$',
        '^response_format$',
        '^rsyslog_message$',
        '^search_attribute$',
        '^search_value$',
        '^search_type$',
        '^sort_attribute$',
        '^sort_type$',
        '@source',
        '^stats$',
        '^status$',
        '^tags$',
        '@timestamp',
        '^universe$'
      ]
    }
    # collector_id - id of the logstash collector instance that parsed a message
    # @source - Concierge application instance's name, e.g. bazaar-concierge-0f38075696cc5c5d4
    # instance_id - Concierge/Hagrid application instance's id
    # remote_addr - the 1st leftmost IP from the X-Forwarded-For header. Only external ones are matched to geoip location
    # cdn_addr - one of the CDN provider's IPs
    # app - application identifier passed in a request, e.g. &app=curations
    # application - application which processed the request ("concierge" or "hagrid")
    # _type - 'logs', by default. If 'type' field exists it overrides default value of the '_type' field.
    # _type is deprecated starting ES 6.x and should not be used for storing valuable data, except only for filtering purposes.
    # region - passed in application server's rsyslog message. Effectively this is the region where requests had been
    # actually processed, despite the traffic initial origin (dus1, deu1).
  }
}

output {
  # In requestlog "type" field transforms into "application" to determine it and does not exist on this step unlike opslog
  # where "application" field is already present in the input data and [type] == "opslog" is true only for opslog
  if [type] != "opslog" {
    file {
      codec => "json_lines"
      gzip  => true
      path  => "<%= @s3_export_dir %>/<%= @tag_team_name %>/%{application}/<%= @hostname %>.%{application}.%{universe}.%{+YYYY-MM-dd}.log.gz"
    }
    elasticsearch {
      hosts              => ["localhost:9200"]
      index              => "requestlog-<%= @tag_region %>-%{+YYYY.MM.dd}"
      template           => "<%= @config_dir %>/template/requestlog-template.json"
      template_overwrite => true
    }
  }
}

stephenb · October 6, 2022, 8:58pm

Hi @AkankshaSS

Please format your code going forward most people will not respond if they can't read your post... simply select the code and click </> (I did that for you)

Plus seems your link is invalid.

Also you would need to provide sample inputs otherwise we are just guess

Also have you just looked at the raw incoming messages to see what the value is?

AkankshaSS · October 7, 2022, 5:10am

Hi, @stephenb Let me share the screenshot with you. What is the output coming and what is the expectation from this config file?

Expectation

http_x_forwarder is giving the local host IP 127.0.01 but we are expecting this should be some IP where the request is coming from. I hope that's clear now.

stephenb · October 7, 2022, 5:22am

Apologies but no not really much clearer.

If you want help, you're going to have to show what the raw messages look like. Otherwise it's could be just your source is not providing the correct source IP / message... We have no clue what's wrong unless we can see what the source looks like...

And Compare it to the parsing.

It looks like you're getting the IP address of the local host that the rsyslog is running on.

You might want to look up how to maintain source IP over rsyslog

Do a search on that. I'm not at rsyslog expert...

AkankshaSS · October 7, 2022, 5:27am

you mean where the logs are coming from?

AkankshaSS · October 7, 2022, 5:29am

@stephenb Did you get a chance to look logstash conf file?

AkankshaSS · October 7, 2022, 5:29am

AkankshaSS:

 # Use the first external IP in 'remote_addr' array to match its geo location. IPv6 addresses also work.
      if [remote_addr][0] !~ /(127.|10.|192.168|172.1[6-9].|172.2[0-9].|172.3[01].).*/ {
        geoip {
          source    => "[remote_addr][0]"
          target    => "geoip"
          database  => "<%= @config_dir %>/mapping/<%= @geoip_db_file %>"
          add_field => {
            "geoip_location" => [ "%{[geoip][longitude]}", "%{[geoip][latitude]}" ]
            "geoip_city"     => "%{[geoip][city_name]}"
            "geoip_country"  => "%{[geoip][country_name]}"
          }
        }

Looks like there is an issue here.

stephenb · October 7, 2022, 5:37am

Not sure why you think that That's just the geoip part.

Yes, I looked at the logstash but without a sample of the source it doesn't matter... No way to debug.

So in the end if you can not provide sample I probably will not be able to help... Perhaps someone else will.

My suspicion is your rsyslog is replacing the remote IP with the local IP 127.0.0.1

In other words, the message that's coming into logstash already has the 127.0.0.1.

That's just a guess since we can't see.

You can check that by just disabling all the logic and output what the raw messages look like. Then we might be able to help... Or you might find out that your remote IP was already replaced by the 127.0.0.1

stephenb · October 7, 2022, 6:00am

AkankshaSS:

    # Concierge request format is constructed by DropWizard7/Jetty according to the logic: github.com/eclipse/jetty.project/blob/jetty-9.0.7.v20131107/jetty-server/src/main/java/org/eclipse/jetty/server/AbstractNCSARequestLog.java#L118:L138
    # http://stackoverflow.com/questions/29098340/meaning-of-each-field-in-default-format-of-http-request-log-in-dropwizard
    grok {
      match => ["rsyslog_message", '%{DATA:http_x_forwarded_for} %{SPACE}- %{SPACE}(%{USER:remote_user}|%{SPACE}-%{SPACE}) %{SPACE}\[%{HTTPDATE:log_timestamp}\] %{SPACE}"%{WORD:method} %{DATA:path} HTTP/%{NUMBER:http_version}" %{SPACE}%{NUMBER:status} %{SPACE}(?:%{NUMBER:bytes}|-) %{SPACE}"%{DATA:referrer}" %{SPACE}"%{DATA:agent}" %{SPACE}(?:%{NUMBER:request_time})']
      tag_on_failure => ["_grokparsefailure", "_grok_rsyslog_message"]
    }

It mostly likely right here... this is where is doing the match
match => ["rsyslog_message", '%{DATA:http_x_forwarded_for} ......

if the raw "rsyslog_message" has 127.0.0.1 that is your problem...

You could see the raw input with the following...

input {
  tcp {
    codec => "json"
    port  => 5546
    type  => "concierge"
  }
  tcp {
    codec => "json"
    port  => 5547
    type  => "hagrid"
  }
}

#### Comment this filter out to see the raw ... leave it in to see what ends up in http_x_forwarded_for
filter {

  if [type] in ["concierge", "hagrid"] {

    # TODO Remove fix after concierge deployment in scope of BELL-4880 - OpsLogV1R1 : LogRotate and shipping the logs to Kibana.
    if [message] {
      mutate {
        add_field => { "rsyslog_message" => "%{message}" }
      }
    }

    # Concierge request format is constructed by DropWizard7/Jetty according to the logic: github.com/eclipse/jetty.project/blob/jetty-9.0.7.v20131107/jetty-server/src/main/java/org/eclipse/jetty/server/AbstractNCSARequestLog.java#L118:L138
    # http://stackoverflow.com/questions/29098340/meaning-of-each-field-in-default-format-of-http-request-log-in-dropwizard
    grok {
      match => ["rsyslog_message", '%{DATA:http_x_forwarded_for} %{SPACE}- %{SPACE}(%{USER:remote_user}|%{SPACE}-%{SPACE}) %{SPACE}\[%{HTTPDATE:log_timestamp}\] %{SPACE}"%{WORD:method} %{DATA:path} HTTP/%{NUMBER:http_version}" %{SPACE}%{NUMBER:status} %{SPACE}(?:%{NUMBER:bytes}|-) %{SPACE}"%{DATA:referrer}" %{SPACE}"%{DATA:agent}" %{SPACE}(?:%{NUMBER:request_time})']
      tag_on_failure => ["_grokparsefailure", "_grok_rsyslog_message"]
    }
}

output { 
  stdout {}
}

AkankshaSS · October 7, 2022, 8:09am

@stephenb I am thinking to change this line

< / if [remote_addr][0] !~ /(127.|10.|192.168|172.1[6-9].|172.2[0-9].|172.3[01].).*/ { >

if [remote_addr] = {

AkankshaSS · October 7, 2022, 12:37pm

Hi @stephenb Please check this message

 "rsyslog_message" => "127.0.0.1 - - [07/Oct/2022:12:36:18 +0000] \"GET /data/answers.json?PassKeY=carzFE7qLlaw1GkpqAeaywrz2xPEsjGwZQCufy3yy5GS4&ApiVersioN=5.4&FilteR=ProductId:eQ:XYZ123-Product-ExternalId-CCC-Eld3S-JFkNZ&IncludE=questions&FilteR=ModerationStatus:APPROVED&LimiT=10&OffseT=40 HTTP/1.1\" 200 34208 \"-\" \"Jersey/2.34 (HttpUrlConnection 1.8.0_342)\" 14",

AkankshaSS · October 7, 2022, 12:58pm

Hi, @stephenb @warkolm I have checked the Nginx.rb file and looks like. by. default it is set as 127.0.0.1. Please could you help with this.

user  <%= scope.lookupvar('nginx::nginx_user') -%>;
worker_processes  auto;
worker_rlimit_nofile    <%= scope.lookupvar('nginx::worker_open_file_limit') -%>;
error_log  <%= scope.lookupvar('nginx::log_path') -%>/error.log;

pid        /var/run/nginx.pid;

events {
    worker_connections  <%= scope.lookupvar('nginx::worker_connections') -%>;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    server_tokens off;
    client_max_body_size 500m;

    <%
        # Buffer size for reading client request header.
        # If a request line or a request header field does not fit into this buffer then larger buffers, configured by the large_client_header_buffers directive, are allocated.
    %>
    client_header_buffer_size 1k;
    <%
        # Sets the maximum number and size of buffers used for reading large client request header.
        # Format: large_client_header_buffers <max number of buffers> <size of each buffer>
        # A request line cannot exceed the size of one buffer, or the 414 (Request-URI Too Large) error is returned to the client.
        # A request header field cannot exceed the size of one buffer as well, or the 400 (Bad Request) error is returned to the client.
        # Buffers are allocated only on demand. By default, the buffer size is equal to 8K bytes.
        # If after the end of request processing a connection is transitioned into the keep-alive state, these buffers are released.
    %>
    large_client_header_buffers 8 32k;
    <%
        # 2020/07/14
        # In production we started seeing "upstream sent too big header while reading response header from upstream" error messages.
        # As a result, we increased the proxy_buffer_size from 4k (i.o.w. one memory page) to 32k which allows us to handle large redirect and response headers.
        # In addition, we increased the size of the individual proxy buffers from 4k to 32k to allow larger responses to be held in memory to reduce NGINX's
        # need to write the responses to disk before returning them to the client.
        # Note: When modifying any of the buffer sizes, make sure the application buffer sizes are updated appropriately.
    %>
    proxy_buffer_size 32k;
    proxy_buffers 8 32k;

    log_format json escape=json
        '{'
        '"time":"$time_local",'                                                         # The time the request completed.
        '"http_x_forwarded_for":"$http_x_forwarded_for",'                               # The requesting server and any intermediary proxies. If the request was handled by our CDN, the rightmost IP address will be one of our CDN's southbound IP addresses.
        '"load_balancer_addr":"$remote_addr",'                                          # The load balancer internal IP address if the request came through our load balancer, otherwise it's the IP address of the requester.
        '"path":"$request_uri",'                                                        # The full original request URI.
        '"method":"$request_method",'                                                   # The method of the request.
        '"lb_request_scheme":"$http_x_forwarded_proto",'                                # The protocol used to connect to the Load Balancer.
        '"request_scheme":"$scheme",'                                                   # The scheme of the request received by NGINX.
        '"request_protocol":"$server_protocol",'                                        # The protocol of the request.
        '"request_length":"$request_length",'                                           # The total length of the request (including request line, header, and request body).
        '"referrer":"$http_referer",'                                                   # The webpage from which the request originated. An empty referrer indicates either server-side or non-webpage based calls.
        '"agent":"$http_user_agent",'                                                   # The software agent making the request on behalf of a user.
        '"response_total_size":"$bytes_sent",'                                          # The total number of bytes sent back to the client.
        '"response_body_size":"$body_bytes_sent",'                                      # The number of bytes sent back to a client, not including the response header.
        '"status":"$status",'                                                           # The response status code sent to a client.
        '"request_time":"$request_time",'                                               # The total time (in seconds with millisecond resolution) it took to process the request from the first byte from the client to the last byte of the response sent to the client.
        '"upstream_status":"$upstream_status",'                                         # The response status codes from each upstream server involved with handling the request.
        '"upstream_connect_time":"$upstream_connect_time",'                             # The times (in seconds with millisecond resolution) it took to establish a connection to each upstream server involved with handling the request.
        '"upstream_header_time":"$upstream_header_time",'                               # The times (in seconds with millisecond resolution) it took for each upstream server involved with handling the request to return a response header.
        '"upstream_response_time":"$upstream_response_time",'                           # The times (in seconds with millisecond resolution) it took for each upstream server involved with handling the request to return a full response.
        '"redirect_location":"$saved_redirect_location",'                               # When Concierge is redirecting a request, this is the full URL of the redirected request. Otherwise, it's "".
        '"is_redirected_request":"$http_x_bazaarvoice_data_redirect",'                  # "TRUE" if the request is a redirected request. Otherwise, it's "".
        '"redirected_request_handled":"$upstream_http_x_bazaarvoice_data_redirected",'  # "TRUE" if the request was redirected and was handled. Otherwise, it's "".
        '"client":"$http_x_bazaarvoice_customer",'                                      # The name of the client making the request if the request passed through our CDN. Otherwise, "".
        '"app_name":"$http_x_bazaarvoice_appname",'                                     # The name of the application that's tied to the passkey if the request passed through our CDN. Otherwise, "".
        '"message_id":"$http_x_bazaarvoice_messageid",'                                 # The unique message ID provided by our CDN so we can track each request. "" if the request did not pass through our CDN.
        '"cdn":"$http_x_bazaarvoice_cdn"'                                               # The name of our CDN which handled the request. "" if the request did not pass through our CDN.
        '}';

    # Logs the new format regardless of the env.
    access_log  <%= scope.lookupvar('nginx::log_path') -%>/access.log  json;

    sendfile          on;
    tcp_nopush        on;
    keepalive_timeout <%= scope.lookupvar('nginx::server_keepalive_timeout') %>;

    # 2020/07/14
    # To better prevent ephemeral port exhaustion, which we encountered previously in BELL-5232, we can leverage that
    # all IPs on the 127.0.0.0/8 subnet refer to the local loopback. This allows us to refer to the upstream application
    # on any local IP address instead of just 127.0.0.1 (localhost). Therefore, we can greatly increase the number of connections
    # to our upstream application as the number of available ports is now N times larger, where N is the number of listed
    # upstream IP addresses. Therefore in our case with 55,295 ephemeral ports, we can now have 16 X 55,295 = 884,720 connections
    # to our upstream application. Note: this is likely far more than we will need based on our current load numbers, but
    # it should prevent us from having to modify this anytime soon.
    # https://www.nginx.com/blog/overcoming-ephemeral-port-exhaustion-nginx-plus/
    # https://tech.freckle.com/2016/01/21/scaling-beyond-65k-reverse-proxy-connections-on-localhost/
    #
    # Also, since each upstream server is the local concierge application, max_fails is being set to 0 so that NGINX doesn't
    # mark each server as unavailable whenever concierge has a performance hiccup.
    upstream concierge_application {
        <%- 1.upto(16) do |i| -%>
        server 127.0.0.<%= i %>:<%= scope.lookupvar('nginx::upstream_port') %>      max_fails=0;
        <%- end -%>
        keepalive 128;
        keepalive_timeout <%= scope.lookupvar('nginx::upstream_keepalive_timeout') %>;
    }

    server {
        server_tokens off;
        # Set a default value for redirect location. In cases where we ARE redirecting, the error_page section will override this value.
        set $saved_redirect_location '';

        listen       <%= scope.lookupvar('nginx::port_listen') -%>;
        server_name  <%= scope.lookupvar('nginx::server_name') -%>;

        resolver     dns1 dns2;

        proxy_socket_keepalive  on;
        proxy_connect_timeout   <%= scope.lookupvar('nginx::proxy_connect_timeout') -%>;
        proxy_send_timeout      <%= scope.lookupvar('nginx::proxy_send_timeout') -%>;
        proxy_read_timeout      <%= scope.lookupvar('nginx::proxy_read_timeout') -%>;

        location / {

            # Conciege only needs to respond to GET requests
            # Hagrid needs to respond to GET and POST (for catalogInfoService) requests
            # Note, that GET here also allows "HEAD"
            #  https://nginx.org/en/docs/http/ngx_http_core_module.html#limit_except
            # Separately we configure Jetty to only support GET and POST as well, but doing it
            #  here is also good.
            limit_except GET POST {
                deny all;
            }

            proxy_set_header        Host            $host;
            proxy_set_header        X-Real-IP       $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header        Connection      "";
            proxy_http_version      1.1;
            proxy_intercept_errors  on;

            proxy_pass http://concierge_application;
        }

        # This block redirects 302 responses from the app server. upstream_http_location is the target endpoint for the redirect.
        error_page   302 = @handle_redirect;
        location @handle_redirect {
            set $saved_redirect_location '$upstream_http_location';
            proxy_pass $saved_redirect_location;
        }
    }

    # Nginx status and base metrics
    server {

        listen      8091;

        location /status {
            stub_status     on;
            access_log      off;
            allow           127.0.0.1;
            deny            all;
        }

    }

}

leandrojmp · October 7, 2022, 1:53pm

The 127.0.0.1 is comming from your syslog server, you need to fix it there.

"rsyslog_message" => "127.0.0.1 - - [07/Oct/2022:12:36:18 +0000]

What is this Nginx.rb file? Is this part of some automation you have? It does not seem that your issue is in Logstash, it looks like that your issue is somewhere else.

AkankshaSS · October 7, 2022, 3:12pm

Hi @leandrojmp this nginx.rb coming from the repository. where logs are coming from and I have noticed that that above nginx.rb file accepting this:

allow           127.0.0.1;
            deny            all;

AkankshaSS · October 7, 2022, 3:13pm

Please see this nginx file accepting. this http_x_forwarded_for parameter and taking. default cdn ip

127.0.0.1

AkankshaSS · October 7, 2022, 3:20pm

Hey @leandrojmp we don't have any syslog server runing.

leandrojmp · October 7, 2022, 3:21pm

Which repository? It is not clear how this issue relates to any tool of the Elastic stack.

Your log is arriving at Logstash with the 127.0.0.1 IP address, you need to change it in the source of the log.

AkankshaSS · October 7, 2022, 4:07pm

Hey, no we have something concierge API from there, I have enabled the logs from there and in logstash there is a file called request-log.conf where I. am trying to make the changes.Please check this URL kibana kibana

Topic		Replies	Views
Overriding logstash access "remote_host" with client IP - Spring Boot Logstash	1	266	August 15, 2022
Firewalls -> nginx (udp) -> 4 x logstash losing device IP Logstash	1	790	April 18, 2019
Apache HTTPD logs with (comma-delimited) X-Forwarded-For IPs Beats filebeat	5	2855	June 19, 2019
Clientip field is not identified by logstash (Solved) Logstash	12	4490	July 6, 2017
Using cidr plugin for tagging logs Logstash	28	7171	February 24, 2017

Remote_addr showing 127.0.01

Related topics