Compare two different indexes and create new ones


(John Ollhorn) #1

HI people,
I'm just starting to analyze my data with Elasticsearch, so I have practically no idea -_-

I would like to try text mining based on all the data I have stored.

Is the following possible, and if so where do I have to google to?

I have two indexes "Newsfeeds" and "Stopswords".
In the first step, I'd like to:

  • Count how many stopwords and which ones appear in the newsfeeds.
  • Save the remaining words from the newsfeeds ( plus the stopswords ) in a new index.

Is this generally possible? Can we do this alone with ES, Kibana and Logstash?


(Mark Walkom) #2

You way want to look at using percolator here, it'd be suitable to matching the stopwords.

However you would need to do a bit of work in your own code to take the feeds, pass them through the percolation and then create the other index. That part is not something the stack can automatically do for you.


(John Ollhorn) #4

Hello, folks,

Can I set the type of a field to Percolate when creating an index via mapping? If so, how? :slight_smile:

try {
      await client.indices.create({
        index: 'test',
        mapping: {
          properties: { query: { type: "percolator" } }
        },
        ignore: [400]
      })
    } catch (e) {
      throw e
    }

That doesn't seem to work.

GET /test/_search
{
    "query" : {
        "percolate" : {
            "field" : "query",
            "document" : {
                "description" : "test"
            }
        }
    }
}

return

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "field [query] does not exist",
        "index_uuid": "EvFmjhdYQPiBD4cEhoYkiw",
        "index": "test"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "test",
        "node": "QCctHTucSGaVxmsobvKmzQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "field [query] does not exist",
          "index_uuid": "EvFmjhdYQPiBD4cEhoYkiw",
          "index": "test"
        }
      }
    ]
  },
  "status": 400
}

I always have to manually over the Dev tools:

PUT /test/_mapping/feed
{
  "properties": {
    "query": { "type": "percolator" }
  }
}

(Mark Walkom) #5

What does the document look like?


(John Ollhorn) #6
{
  "_index": "test",
  "_type": "feed",
  "_id": "25419144fd9fec74ab26ce5b86e4fa0059a8c123",
  "_version": 1,
  "_score": null,
  "_source": {
    "title": "Cross-Channel, Data und Mediamix: Das sind die zentralen Herausforderungen für Marketingentscheider",
    "description": "<div class=\"alignleft detailm\">    <img src=\"https://www.horizont.net/news/media/26/Manager-Marketingchef-CMO-253003-detailm.jpeg\" alt=\"Manager Marketingchef CMO\" title=\"Manager Marketingchef CMO\"/>    <br/><span class=\"imgsubtitle\"></span>    <div class='imgcreditbg'><span class='imgcredit'>© Fotolia / kasto</span></div>  </div>In seiner jährlichen Studie \"Getting Media Right\" hat  Kantar Millward Brown die zentralen Branchen-Herausforderungen für Marketingspezialisten analysiert. Eines der Ergebnisse: 45 Prozent halten den Mediamix in ihrem Unternehmen für nicht optimal.",
    "summary": "<div class=\"alignleft detailm\">    <img src=\"https://www.horizont.net/news/media/26/Manager-Marketingchef-CMO-253003-detailm.jpeg\" alt=\"Manager Marketingchef CMO\" title=\"Manager Marketingchef CMO\"/>    <br/><span class=\"imgsubtitle\"></span>    <div class='imgcreditbg'><span class='imgcredit'>© Fotolia / kasto</span></div>  </div>In seiner jährlichen Studie \"Getting Media Right\" hat  Kantar Millward Brown die zentralen Branchen-Herausforderungen für Marketingspezialisten analysiert. Eines der Ergebnisse: 45 Prozent halten den Mediamix in ihrem Unternehmen für nicht optimal.",
    "date": "2018-10-18T20:06:00.000Z",
    "pubdate": "2018-10-18T20:06:00.000Z",
    "pubDate": "2018-10-18T20:06:00.000Z",
    "link": "http://www.horizont.net/marketing/nachrichten/cross-channel-data-und-mediamix-das-sind-die-zentralen-herausforderungen-fuer-marketingentscheider-170469?utm_source=rss&utm_medium=referral&utm_campaign=news-marketing&utm_term=%7Butm_term%7D",
    "guid": "http://www.horizont.net/marketing/nachrichten/cross-channel-data-und-mediamix-das-sind-die-zentralen-herausforderungen-fuer-marketingentscheider-170469?utm_source=rss&utm_medium=referral&utm_campaign=news-marketing&utm_term={utm_term}",
    "author": "Redaktion",
    "comments": null,
    "origlink": null,
    "image": {},
    "source": {},
    "categories": [
      "Nachrichten"
    ],
    "enclosures": [
      {
        "url": "https://www.horizont.net/news/media/26/Manager-Marketingchef-CMO-253003-detailp.jpeg",
        "type": "image/jpeg",
        "length": ""
      }
    ],
    "rss:@": {},
    "rss:title": {
      "@": {},
      "#": "Cross-Channel, Data und Mediamix: Das sind die zentralen Herausforderungen für Marketingentscheider"
    },
    "rss:link": {
      "@": {},
      "#": "http://www.horizont.net/marketing/nachrichten/cross-channel-data-und-mediamix-das-sind-die-zentralen-herausforderungen-fuer-marketingentscheider-170469?utm_source=rss&utm_medium=referral&utm_campaign=news-marketing&utm_term=%7Butm_term%7D"
    },
    "rss:description": {
      "@": {},
      "#": "<div class=\"alignleft detailm\">    <img src=\"https://www.horizont.net/news/media/26/Manager-Marketingchef-CMO-253003-detailm.jpeg\" alt=\"Manager Marketingchef CMO\" title=\"Manager Marketingchef CMO\"/>    <br/><span class=\"imgsubtitle\"></span>    <div class='imgcreditbg'><span class='imgcredit'>© Fotolia / kasto</span></div>  </div>In seiner jährlichen Studie \"Getting Media Right\" hat  Kantar Millward Brown die zentralen Branchen-Herausforderungen für Marketingspezialisten analysiert. Eines der Ergebnisse: 45 Prozent halten den Mediamix in ihrem Unternehmen für nicht optimal."
    },
    "rss:author": {
      "@": {}
    },
    "rss:category": {
      "@": {},
      "#": "Nachrichten"
    },
    "rss:enclosure": {
      "@": {
        "url": "https://www.horizont.net/news/media/26/Manager-Marketingchef-CMO-253003-detailp.jpeg",
        "length": "",
        "type": "image/jpeg"
      }
    },
    "permalink": "http://www.horizont.net/marketing/nachrichten/cross-channel-data-und-mediamix-das-sind-die-zentralen-herausforderungen-fuer-marketingentscheider-170469?utm_source=rss&utm_medium=referral&utm_campaign=news-marketing&utm_term={utm_term}",
    "rss:guid": {
      "@": {},
      "#": "http://www.horizont.net/marketing/nachrichten/cross-channel-data-und-mediamix-das-sind-die-zentralen-herausforderungen-fuer-marketingentscheider-170469?utm_source=rss&utm_medium=referral&utm_campaign=news-marketing&utm_term={utm_term}"
    },
    "rss:pubdate": {
      "@": {},
      "#": "Thu, 18 Oct 2018 22:06:00 +0200"
    },
    "dc:creator": {
      "@": {}
    },
    "meta": {
        "version": "1.0",
        "encoding": "utf-8"
      },
      "atom:link": {
        "@": {
          "href": "https://www.horizont.net/news/feed/",
          "rel": "self",
          "type": "application/rss+xml"
        }
      }
    }
  },
  "fields": {
    "date": [
      "2018-10-18T20:06:00.000Z"
    ],
    "meta.pubdate": [
      "2018-10-18T20:17:44.000Z"
    ],
    "meta.date": [
      "2018-10-18T20:17:43.000Z"
    ],
    "meta.pubDate": [
      "2018-10-18T20:17:44.000Z"
    ],
    "pubDate": [
      "2018-10-18T20:06:00.000Z"
    ],
    "pubdate": [
      "2018-10-18T20:06:00.000Z"
    ]
  },
  "sort": [
    1539893160000
  ]
}

It's an RSS feed.
I'm not using the RSS input plugin (it's just outdated).


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.