Dynamic input (file hashes) for HTTP JSON data collection


I hope you and your loved ones are safe and healthy.

I want to ingest data from an API endpoint that uses JSON for both request and response. (https://developers.virustotal.com/reference)

I am aware that filebeat and logstash both can process JSON input and expand it, my problem is to dynamically raise the queries with the hash value changing for each query.

  1. I have list of data (file hashes) that I can put in a delimiter separated format.

  2. Single hash at a time will form part of the API request (URL):

    A. Hash: 4c3499f3cc4a4fdc7e67417e055891c78540282dccc57e37a01167dfe351b244 and API endpoint (files) https://www.virustotal.com/api/v3/files/{id}

    B. {id} will be replaced by the hash, hence the request URL will be: https://www.virustotal.com/api/v3/files/4c3499f3cc4a4fdc7e67417e055891c78540282dccc57e37a01167dfe351b244

  3. Reading the latest filebeat document: HTTP JSON input | Filebeat Reference [7.15] | Elastic, I see that most of my request parameters are fulfilled except dynamically configuring the id component of the URL, which fetches the data for different file hashes. I need to query for over ~29,000 file hashes and ingest the reply.

Is there a way I can configure filebeat to read a text file and take a hash to query and fetch the data for ingestion? If not, can this be done via combination of filebeat and logstash?

is your filter section in logstash very complicated?
you have 29,000 files and all are these hash # (different number for each file)?

Hi @elasticforme,

  1. I need to store the response from the website and append Geo information.
  2. Hash is a unique identifier for the file which forms the part of the URL (API endpoint).

Here is the process:

  1. Text file contains hashes is read one hash at a time (one line at a time). Lets say the file is list.txt and the first line (hash) is 4c3499f3cc4a4fdc7e67417e055891c78540282dccc57e37a01167dfe351b244
  2. Filebeat or Logstash will need to query following URL:
    https://www.virustotal.com/api/v3/files/4c3499f3cc4a4fdc7e67417e055891c78540282dccc57e37a01167dfe351b244 the hash forms the end off the URL.
  3. As part of the URL API key and few headers will be sent.
  4. Rate limiting will be required (1 query every 15 seconds)
  5. Following is an actual response (I've truncated it):
    "data": {
        "attributes": {
            "type_description": "Win32 DLL",
            "tlsh": "T1D2A4028BB3D485BFE1779B38C5B34A44D772784501319B5E6BA0026A9E737829D38F32",
            "vhash": "145066651d7515155038z45hz1kz4",
            "trid": [
                    "file_type": "Win64 Executable (generic)",
                    "probability": 48.7
                    "file_type": "Win16 NE executable (generic)",
                    "probability": 23.3
                    "file_type": "OS/2 Executable (generic)",
                    "probability": 9.3
                    "file_type": "Generic Win/DOS Executable",
                    "probability": 9.2
                    "file_type": "DOS Executable Generic",
                    "probability": 9.2
            "creation_date": 1610158671,
            "names": [
            "signature_info": {
                "product": "Microsoft@Windows@Operating System",
                "internal name": "EFSCORE.DLL",
                "file version": "10.0.19041.1",
                "original name": "EFSCORE.DLL",
                "copyright": "@Microsoft Corperation. All rights reserved.",
                "description": "EFS Core Library"
            "last_modification_date": 1634072744,
            "type_tag": "pedll",
            "times_submitted": 5,
            "total_votes": {
                "harmless": 0,
                "malicious": 5
            "size": 464384,
            "popular_threat_classification": {
                "suggested_threat_label": "trojan.razy/manuscrypt",
                "popular_threat_category": [
                        "count": 32,
                        "value": "trojan"
                "popular_threat_name": [
                        "count": 6,
                        "value": "razy"
                        "count": 6,
                        "value": "manuscrypt"
                        "count": 4,
                        "value": "nukesped"
            "authentihash": "ffe9327c59664331cadc38af99bd32cd30960297b4277c693781661f5356e013",
            "last_submission_date": 1633265692,
            "meaningful_name": "EFSCORE.DLL",
            "sigma_analysis_summary": {
                "Sigma Integrated Rule Set (GitHub)": {
                    "high": 264,
                    "medium": 8,
                    "critical": 0,
                    "low": 50
            "sandbox_verdicts": {
                "C2AE": {
                    "category": "undetected",
                    "sandbox_name": "C2AE",
                    "malware_classification": [
                "Yomi Hunter": {
                    "category": "malicious",
                    "sandbox_name": "Yomi Hunter",
                    "malware_classification": [
            "sha256": "4c3499f3cc4a4fdc7e67417e055891c78540282dccc57e37a01167dfe351b244",
            "type_extension": "dll",
            "tags": [
            "last_analysis_date": 1634065512,
            "unique_sources": 4,
            "first_submission_date": 1610632578,
            "sha1": "a3060a3efb9ac3da444ef8abc99143293076fe32",
            "ssdeep": "12288:BJWFxerI6z4GrTyPy2ROqz/sC670ahgA6:BJWuIE4Gr2PX5UCs0a",
            "md5": "56018500f73e3f6cf179d3b853c27912",
            "pe_info": {
                "exports": [
                "resource_details": [
                        "lang": "ENGLISH US",
                        "entropy": 3.471111536026001,
                        "chi2": 66260.8359375,
                        "filetype": "Data",
                        "sha256": "fb30bc703491f40de0bcf149f52a99cefaaefe7a73c1e79b85ed19430949aea6",
                        "type": "RT_VERSION"
                        "lang": "ENGLISH US",
                        "entropy": 4.911615371704102,
                        "chi2": 4031.47216796875,
                        "filetype": "application/xml",
                        "sha256": "4bb79dcea0a901f7d9eac5aa05728ae92acb42e0cb22e5dd14134f4421a3d8df",
                        "type": "RT_MANIFEST"
                "rich_pe_header_hash": "2c105fc4bec9ee0652c3ce1464ba625a",
                "imphash": "f9a03bf01870765beb9b5b490126a8c7",
                "compiler_product_versions": [
                    "id: 225, version: 20806 count=38",
                    "id: 224, version: 20806 count=155",
                    "id: 223, version: 20806 count=10",
                    "id: 203, version: 65501 count=7",
                    "[---] Unmarked objects count=110",
                    "id: 229, version: 21005 count=4",
                    "[EXP] VS2013 build 21005 count=1",
                    "[RES] VS2013 build 21005 count=1",
                    "id: 151, version: 0 count=1",
                    "[LNK] VS2013 build 21005 count=1"
                "resource_langs": {
                    "ENGLISH US": 2
                "machine_type": 34404,
                "timestamp": 1610158671,
                "resource_types": {
                    "RT_MANIFEST": 1,
                    "RT_VERSION": 1
                "sections": [
                        "name": ".text",
                        "chi2": 400271.81,
                        "virtual_address": 4096,
                        "flags": "rx",
                        "raw_size": 60416,
                        "entropy": 6.46,
                        "virtual_size": 60075,
                        "md5": "1b2f70947e9e4d7998ea600376591420"
                        "name": ".rdata",
                        "chi2": 1883778.5,
                        "virtual_address": 65536,
                        "flags": "r",
                        "raw_size": 29696,
                        "entropy": 4.28,
                        "virtual_size": 29558,
                        "md5": "4e5a441c7d64d45d6b8eeb964644efc0"
                        "name": ".data",
                        "chi2": 15882.2,
                        "virtual_address": 98304,
                        "flags": "rw",
                        "raw_size": 367104,
                        "entropy": 7.98,
                        "virtual_size": 376008,
                        "md5": "2ed9aba09bcd4be38125aff90d7a2b3d"
        "type": "file",
        "id": "4c3499f3cc4a4fdc7e67417e055891c78540282dccc57e37a01167dfe351b244",
        "links": {
            "self": "https://www.virustotal.com/api/v3/files/4c3499f3cc4a4fdc7e67417e055891c78540282dccc57e37a01167dfe351b244"
  1. In some cases the JSON response will have key value pairs which will have IPv4 addresses, which I need to lookup against GeoDB for appending information.

  2. Store the information in an index.

I have to process ~27,000 such hashes

I hope this helps.

I think I understand now little better.

in this case I believe your best bet is to use python.

in python you open a file read it until you hit the hash
put that in variable and call http I think it is request.get (url)

Thanks a lot.

I can create one file per hash using python but wanted to know if a full file will be ingested using fileshash or it only support JSON in a single line? I will try this nonetheless & let you know

you can read all this file via python and ingest in to Elasticsearch.
Elasticsearch has python library which you can use to write in to

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.