Using regular expressions to blacklist terms

graph

(reza sadoddin) #1

I am wondering if there is a way to blacklist a group of terms (e.g. numbers) from being appearing in Graph? The idea is to ignore irrelevant concepts and focus on connections between concepts of interest.

Thanks for sharing your ideas.


(Mark Harwood) #2

Not in the graph plugin per se, but a custom choice of analyzer could be used at index time to remove numbers. Generally the terms (be they numbers or words) which are identified should be relevant - for example the number "9200" is singled out as relevant in elasticsearch stack overflow articles.


(reza sadoddin) #3

Thanks Mark, it makes sense.

I wish this feature can be added later, because in many scenarios users want to start exploring their data using Graph, and they might need to exclude some groups of terms.
This feature avoids reindexing the data from scratch, and makes analysis much easier.


(Mark Harwood) #4

Just out of curiosity - are you running with significant links on or off? Normally with significance turned on irrelevant stuff (numbers or words) shouldn't be returned.


(reza sadoddin) #5

No I have not turned this feature off. My point is about necessity of ignoring a group of words in the graph, not because they are irrelevant, but for the purpose of focusing on some other relationships.


(Mark Harwood) #6

Understood and noted.
We tend to get a bit nervous about reaching for regex to filter things on the server-side as it a notorious performance bottleneck. I wonder if a client-side filtering framework like the features Gephi provides [1] might be a more flexible solution?

Many thanks for the feedback :slight_smile:

[1] https://blog.ouseful.info/2010/04/23/getting-started-with-gephi-network-visualisation-app-–-my-facebook-network-part-ii-basic-filters/


(system) #7