Establishing graph connections

graph

(Morio Ramdenbourg) #1

Hi,

I'm currently new to graph and have been learning its API using CSV exports from crunchbase that I indexed into Elasticsearch. Each company in the database currently has a company name, a list of categories that are associated with that company, city, region, and other data for that company.

I would like to establish and expand on connections between companies with related fields. However, I am only able to establish these connections when selecting more than one field along with the company name, such as below when I select both the name and category fields.

Ideally I would like to be able to select only the company name field, and from there expand on other related company names based off of not only the categories, but all the fields that are part of the company. In a way similar to the graph capabilities video where a musician's name was searched, and from there the musician was able to expand to other musicians that were related.

Is there a way to produce this graph with the company data, or are there any other tools I could use to produce this? Additionally, is it possible to insert request queries into the graph to produce a similar effect?

Thanks!


(Mark Harwood) #2

We'd need to narrow down exactly what data you are using and on what basis you want to make connections. For example, it looks like there would be a "competitors" graph you could create from the "competitors.csv" file here: https://data.crunchbase.com/docs/daily-csv-export

From there you can create docs like this:

POST competitors/link
{
	"competitors" : ["Microsoft", "Apple"]
}
POST competitors/link
{
	"competitors" : ["Spotify", "Apple"]
}

Change the Graph UI settings to work with these low-frequency signals:

Then search...


(Morio Ramdenbourg) #3

Thanks for the quick response.

I have been using the "organizations.csv" file from crunchbase where I have docs like the following:

{
"permalink": "/organization/10-20-media",
"name": "10-20 Media",
"homepage_url": "http://www.10-20media.com",
"category_list": [
"E-Commerce"
],
"funding_total_usd": "2050000",
"status": "operating",
"country_code": "USA",
"state_code": "MD",
"region": "Baltimore",
"city": "Woodbine",
"funding_rounds": "4",
"founded_at": "2001-01-01",
"first_funding_at": "2009-06-18",
"last_funding_at": "2011-12-28"
}

However, when I search for a certain company, it seems to graph other companies, but it does not create any connection between them, as shown:

I'd like to make connections between these companies based off of categories, where a company is connected to another based off of similar or related categories. Ultimately, I'd like to be able to expand the graph on companies that a user would most likely be interested in based off of the original search.


(Mark Harwood) #4

Not all categories are equally interesting - some are very broad and over-link e.g. "software". I found a more useful approach is to start with the categories and identify the network of relations between those and then overlay the companies. Here is an example demo I put together using crunchbase investments data: https://youtu.be/eky_ml0nOns


(Morio Ramdenbourg) #5

Thank you for the demo, it was very helpful in understanding the details that I was missing.

Sorry one last question: I understood the demo in the video, but is it also possible to graph strongly related companies with only the name field selected, and no other fields, but still have that data establish connections based off of categories? I am just wondering as it may make it easier to visualize and analyze.


(Mark Harwood) #6

What you are describing sounds like a "More Like This" type query [1]

The challenge with going from single-company-description to other companies with any degree of confidence is that you only have a very small number of data points as the glue. Each company (at least in the data I have seen) has only 3 or 4 category terms - some of which are very broad e.g. "software". This is a flimsy basis for drawing out a graph of links between companies. Maybe shared investors would also be a source of similarity but I expect this would be unreliable too.

In the video I chose to draw a graph of the related industry categories because each of those links are more robust and can be validated with confidence. The category "web design" can be linked to "User experience design" with high confidence because there may be 50 or more different companies that reinforce that connection.

The tools that the graph API provides are designed to identify statistically significant relationships in the data. The results are only as good as the strength of signal that is available in the data.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-mlt-query.html


(system) #7