Graph relation extraction on lastfm data

inancarin · November 7, 2016, 12:08pm

Hi guys,

I am using elasticsearch for a while, but I am newbie on Graph API. I just indexed lastfm data into ES and one sample document in my index as follows:

{
            "_index": "lastfm",
            "_type": "song",
            "_id": "AVg-oFWHtNjcLv7Y8nnT",
            "_score": 1,
            "_source": {
               "timestamp": "2009-02-03T16:54:25Z",
               "userid": "user_000001",
               "artist-name": "Ken Ishii",
               "track-name": "Frame Out",
               "musicbrainz-track-id": "8f28cbe6-3e46-4f96-816d-304620f64b41",
               "musicbrainz-artist-id": "6d4c4759-8a16-4b9f-83e2-4c225307fc85",
               "user": {
                  "gender": "m",
                  "signup": "Aug 13, 2006",
                  "country": "Japan"
               }
            }
         }

What I want to do here is to find relations among artists-artists (I mean people listening some artist, they also listen another artist), among countries-artists, among user-artists and so on.

I can visulize charts on artist name in Kibana in a correct way as follows:

However when I try to find relations on graph api, I find vertexes but I cannot find relations among vertexes and I cannot expand selected vertexes ("artist-name.keyword" field selected)

When I select "artist-name" field, it gives me the following error: "Error 400 Bad Request: Fielddata is disabled on text fields by default. Set fielddata=true on [artist-name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."

It is okay, I managed to solve it with the following:

PUT lastfm/_mapping/song
{
  "properties": {
    "artist-name": { 
      "type":     "text",
      "fielddata": true
    }
  }
}

Now, I am able to display vertexes and their relations, however "damien" and "rice" are located on different vertexes, they should be on one specific vertex "damien rice"

Any help on this will make me very happy.

Thanks,

Mark_Harwood · November 7, 2016, 2:53pm

So the main thing to be aware of here is that Graph draws on co-occurrence of tokens in the same document. That means for an artist->artist graph you need

Documents that contain more than one artist-name and
Indexed artist-name tokens that haven't split the string damien rice into damien and rice. (So untokenized "keyword" strings)

To do this kind of analysis we create one document per user with an array of the band names they like e.g.

 { "name": "Mark", likedArtists:["Fugazi", "Polica", "Team sleep", "Mastodon" ..] }

.. and use the appropriate mapping definition.

Here's a script to do exactly all of this with the LastFm data using version 5+ of elasticsearch: https://gist.github.com/markharwood/f67a8532f0acba8dcc3fba07541b0933

Cheers
Mark

inancarin · November 9, 2016, 12:14am

Hi Mark,

First of all thanks for your answer and sorry for late answer. It is working now (By the way, I realised there are two different lastfm datasets and we were using different ones ).

I have a question, If I have a streaming data, I mean assume that users are continuously listening, liking or rating new songs/artists. Wouldn't it be costly the way you keep the data? When a user liked a new artist for example (First find the user, then check whether this artist exists in the current array in the artists field. If not update this array). What do you think in this kind of streaming data?

Mark_Harwood · November 9, 2016, 9:09am

It doesn't have to be. I don't think it is vital for the benefit of others' recommendations that my latest song-play is updated immediately. That single action won't swing their recommendations but it is important to continually apply updates to keep abreast of new trends. This can be done in mini-batches where perhaps a day's worth of listening habits can be consolidated as a single update to a user profile. Some example scripts and a discussion is in this talk on "entity centric indexing" : https://www.youtube.com/watch?v=yBf7oeJKH2Y

Mark_Harwood · November 9, 2016, 9:37am

(BTW, shifting this to the Graph forum)

inancarin · November 9, 2016, 2:44pm

Ohh I see, you are right about immediate updates. Thanks for your comments, they are very helpful.

Inanc

system · December 7, 2016, 2:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot find nodes in Graph using Kibana Kibana elastic-stack-graph	2	487	August 1, 2018
Graph Relations not displayed Kibana	3	562	April 8, 2021
How to Use Graph Kibana elastic-stack-graph	7	1446	July 6, 2017
Last.fm sample data Kibana elastic-stack-graph	3	1378	July 6, 2017
Graph explore API - Get documents related to graph edges Kibana elastic-stack-graph	2	692	February 11, 2022

Graph relation extraction on lastfm data

Related topics