How can you search two fields? / How can you search multiple fields?

graph

(Yeikel) #1

Let's say that we have the following CSV file :

id,first_name,last_name,age
1,Lillian,Ross,20
2,Lillian,Perkins,23
3,Carl,Black,25
4,Raymond,Cruz,55
5,Jonathan,Kim,32
6,Judy,Berry,14
7,James,Matthews,24
8,Elizabeth,Wells,52
9,Stephanie,Perkins,32
10,Aaron,Flores,12
11,Larry,Gutierrez,42

My goal it is to build a graph as follows :

first_name
last_name \
age

The problem it is that when I search for a person , let's say Perkins , I get multiple results when I am more interested in Lillian Ross.

How can I build relationships based on multiple fields? or how can I search on multiple fields?

SQL-Like answer :> select * from people where first_name='Lillian AND last_name="Perkins"


(Mark Harwood) #2

I'm not sure what your objective is or how graph might help?
Your CSV is a list of people so I'm unclear how that might form a graph?

I've had a CSV before where each line was 2 people (reviewerName, revieweeName) and from that it was possible to draw out a graph that resembles an org chart but I don't see what you would hope to do with your one-person-per-row CSV?


(Yeikel) #3

It looks that the simplification of my question complicated the case. Let me try to explain what I am trying to build. I am sorry .

Let's say that we have the following data set

id , first_name,last_name,age,school,hobby,state,degree,street
1,Lillian,Ross,20,Harvard,Sports,MA ,Computer Science,123 Main Address
2,Lillian,Ross,20,Standford,Music,MA ,Computer Science,1234 Main Address
3,Lillian,Ross,20,Florida University,Sports,MA ,Computer Science,123567 Main Address
4,Lill,Ross,50,Harvard,Sports,MA ,Computer Science,123 Main Address
5,Lilli,Ross,50,Harvard,Sports,MA ,Computer Science,123 Main Address

My main goal it is to match them and build a graph based on fuzzy calculations giving priorities to specific fields.

For example , we can agree that the following set It is the same person as they share a similar name (based on fuzzy score) and they share the same address.

4,Lill,Ross,50,Harvard,Sports,MA ,Computer Science,123 Main Address
5,Lilli,Ross,50,Harvard,Sports,MA ,Computer Science,123 Main Address

Also ,

If I want to build a single graph for the following data :

1,Lillian,Ross,20,Harvard ,Computer Science,123 Main Address
2,Lillian,Ross,20,Standford,Computer Science,1234 Main Address


(Mark Harwood) #4

Ah, so "entity resolution".
I generally wouldn't recommend using "fuzzy" as part of automated entity resolution on large datasets as mistakes are amplified as part of the recursive nature of fusing data. Each fusion operation has to be founded on a highly trusted link.
The last 5 minutes of my elasticon talk [1] demonstrate an effective approach to entity resolution using Graph and special indexing techniques.

[1] https://www.elastic.co/elasticon/conf/2016/sf/graph-capabilities-in-the-elastic-stack (registration required)


(Yeikel) #5

I've had a CSV before where each line was 2 people (reviewerName, revieweeName) and from that it was possible to draw out a graph that resembles an org chart but I don't see what you would hope to do with your one-person-per-row CSV?

Can you explain how do you the data preparation for this case?

I have a case that it is similar to that. For example :
id,parent_id,name

1,null,Peter
2,1,John
3,1,Johanna
4,3,Mark
5,3,Luis
6,2,Matthew

I am looking to build something like this :

Overall , based on my basic understanding , every user case does not only depend on the data itself but the way that we ingest it as well (New York example that you gave me in my last post).

I saw your video and I really see how I can use what you explain , but yet I am not sure how do you digest the data. As I said , I assume that this is the crucial part of the problem , but please let me know if this is correct.

I highly appreciate your help


(system) #6