Reindexing packetbeats

Ok i have been following this tutorial and i have almost completed it to predict.

but when i am trying to reindex the packetbeat i am only reindexing the previous data not actually live.

so how can i capture the packetsbeat and captrue it live.

thanks

Welcome to our community! :smiley:

It's not clear what you are asking here. Are you talking about reindexing the live data? Or capturing the live data?

1 Like

thank you for your welcoming.
i am capturing my network data through packetsbeats. after that i am preprocessing the domain name to detect wether it's malicious packet or not. i have already done all that.
but because i am reindexing the i am only getting previous data.
like if i am indexing at 3:01 pm i am only data from 3:01 pm and previous.
i want to index live data and continuous. how can i do that

Ah ok. How are you doing the processing exactly?

1 Like

I am taking the packetsbeats data (specifically domain name) and take n-gram of the name through painless script then i am putting the script through the pipeline so i can do inference the model.

this is the code :


    PUT _ingest/pipeline/dga_ngram_expansion_inference
    {
        "description": "Expands a domain into unigrams, bigrams and trigrams and makes a prediction of maliciousness",
        "processors": [
          {
            "script": {
              "id": "ngram-extractor-packetbeat",
              "params":{
                "ngram_count":1
              }
            }
          },
           {
            "script": {
              "id": "ngram-extractor-packetbeat",
              "params":{
                "ngram_count":2
              }
            }
          },
           {
            "script": {
              "id": "ngram-extractor-packetbeat",
              "params": {
                "ngram_count":3
              }
            }
          },
                  {
      "inference": {
        "model_id": "tes22-1597491011440",
        "target_field": "predicted_label",
        "field_map": {},
        "inference_config": { "classification": {"num_top_classes": 2} }
      }
    },
      {
               "script": {
              "id": "ngram-remover-packetbeat",
              "params":{
                "ngram_count":1
              }
            }
          },
           {
            "script": {
              "id": "ngram-remover-packetbeat",
              "params":{
                "ngram_count":2
              }
            }
          },
           {
            "script": {
              "id": "ngram-remover-packetbeat",
              "params": {
                "ngram_count":3
              }
            }
      }
        ]
    }



    PUT _ingest/pipeline/dns_classification_pipeline
    {
      "description": "A pipeline of pipelines for performing DGA detection",
      "version": 1,
      "processors": [
        {
          "pipeline": {
            "if": "ctx.containsKey('dns') && ctx['dns'].containsKey('question')  && ctx['dns']['question'].containsKey('registered_domain') && !ctx['dns']['question']['registered_domain'].empty",
            "name": "dga_ngram_expansion_inference"
          }
        }
      ]
    }

    GET _ingest/pipeline/

    POST _reindex?wait_for_completion=false&refresh=true
    { 
      "source": {
        "index": "packetbeat-7.8.1-*",
        "query": {
          "bool": {
            "must": [
              {
                "range": {
            "@timestamp": {
              "gte": "now-1h/h",
              "lt": "now"
            }
          }
          },
          {
              "match": {
            "method": "Query"
          }
          }
              
            ]
          }
        }
      },
      "dest": {
        "index": "dga-detection_with_live_data17",
        "pipeline": "dns_classification_pipeline",
        "op_type": "index"
      }
    } 

i have made some work around i have created a python script that runs every 5 min to index the last 5 min.
but i know this isn't really a solution as in production it will have a big impact on the system.

Ok, this is outside the scope of what Packetbeat does at the moment.

However it's a great idea, and I would encourage you to raise a feature request in GitHub to see if there might be something that can be built in in future.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.