Reindexing packetbeats

Ok i have been following this tutorial and i have almost completed it to predict.

but when i am trying to reindex the packetbeat i am only reindexing the previous data not actually live.

so how can i capture the packetsbeat and captrue it live.

thanks

Welcome to our community! :smiley:

It's not clear what you are asking here. Are you talking about reindexing the live data? Or capturing the live data?

thank you for your welcoming.
i am capturing my network data through packetsbeats. after that i am preprocessing the domain name to detect wether it's malicious packet or not. i have already done all that.
but because i am reindexing the i am only getting previous data.
like if i am indexing at 3:01 pm i am only data from 3:01 pm and previous.
i want to index live data and continuous. how can i do that

Ah ok. How are you doing the processing exactly?

I am taking the packetsbeats data (specifically domain name) and take n-gram of the name through painless script then i am putting the script through the pipeline so i can do inference the model.

this is the code :


    PUT _ingest/pipeline/dga_ngram_expansion_inference
    {
        "description": "Expands a domain into unigrams, bigrams and trigrams and makes a prediction of maliciousness",
        "processors": [
          {
            "script": {
              "id": "ngram-extractor-packetbeat",
              "params":{
                "ngram_count":1
              }
            }
          },
           {
            "script": {
              "id": "ngram-extractor-packetbeat",
              "params":{
                "ngram_count":2
              }
            }
          },
           {
            "script": {
              "id": "ngram-extractor-packetbeat",
              "params": {
                "ngram_count":3
              }
            }
          },
                  {
      "inference": {
        "model_id": "tes22-1597491011440",
        "target_field": "predicted_label",
        "field_map": {},
        "inference_config": { "classification": {"num_top_classes": 2} }
      }
    },
      {
               "script": {
              "id": "ngram-remover-packetbeat",
              "params":{
                "ngram_count":1
              }
            }
          },
           {
            "script": {
              "id": "ngram-remover-packetbeat",
              "params":{
                "ngram_count":2
              }
            }
          },
           {
            "script": {
              "id": "ngram-remover-packetbeat",
              "params": {
                "ngram_count":3
              }
            }
      }
        ]
    }



    PUT _ingest/pipeline/dns_classification_pipeline
    {
      "description": "A pipeline of pipelines for performing DGA detection",
      "version": 1,
      "processors": [
        {
          "pipeline": {
            "if": "ctx.containsKey('dns') && ctx['dns'].containsKey('question')  && ctx['dns']['question'].containsKey('registered_domain') && !ctx['dns']['question']['registered_domain'].empty",
            "name": "dga_ngram_expansion_inference"
          }
        }
      ]
    }

    GET _ingest/pipeline/

    POST _reindex?wait_for_completion=false&refresh=true
    { 
      "source": {
        "index": "packetbeat-7.8.1-*",
        "query": {
          "bool": {
            "must": [
              {
                "range": {
            "@timestamp": {
              "gte": "now-1h/h",
              "lt": "now"
            }
          }
          },
          {
              "match": {
            "method": "Query"
          }
          }
              
            ]
          }
        }
      },
      "dest": {
        "index": "dga-detection_with_live_data17",
        "pipeline": "dns_classification_pipeline",
        "op_type": "index"
      }
    } 

i have made some work around i have created a python script that runs every 5 min to index the last 5 min.
but i know this isn't really a solution as in production it will have a big impact on the system.

Ok, this is outside the scope of what Packetbeat does at the moment.

However it's a great idea, and I would encourage you to raise a feature request in GitHub to see if there might be something that can be built in in future.