Using neighborhood approximation to make trust queries more efficient

Abstract

For a curation system, working at all pretty much implies containing clusters of loose consensus (places with many cycles of trust/reciprocal trust) (I think the word is "neighborhoods"), and if those clusters are far enough from the user running a query, we can approximate the web of trust graph by replacing clusters with cluster approximation nodes. Cluster approximation nodes aggregate the posts and ingoing/outgoing trust relations of the set of users covered by them. Cluster node state is maintained globally, to be shared. This pattern will allow us to assemble a simplified representation of the entire web of trust for each user, to be accurate in the places they expect it to be, and to resort to shared approximations where complete accuracy would be infeasible, which drastically reduces the number of nodes that have to be visited in the most common types of query.

Prior Art

I don't know anything about that sort of thing.

Rationale

The community of a healthy web of trust will usually have clusters of loose consensus

A helpful curator is selecting for some quality about which they must have mutual knowledge with their audience. In a web of trust, mutual knowledge of the quality in question manifests as cyclical structures in the trust web: People trusting each other or trusting enough mutual friends that by transitivity they will trust each other. We expect that trust webs based on shared taste will often form a number of broad tangles of cycles of consensus, likely to correspond to demographics or communities who are looking for the same thing. We call those clusters and they seem like they'd be very useful to be able to detect, even if only for anthropological surveys.

Following a member of a cluster is like following the whole cluster

Clusters of loose consensus will kind of give you most of the same recommendations no matter where you enter them, no matter which specific member you trust. Everything they post will propagate through all of the others. If a network that is fairly far away from the user opens into a cluster, it may be adequate to just approximate their connection to it with a connection to a prepared cluster approximation node.

A cluster approximation node can be shared by many users' web models, and this lets them share pre-computed results when querying over the vast outer fringes of the web. The unique computation that needs to occur when a user queries their view of the web is drastically reduced.

Distant users who are not in a cluster might not be worth visiting at all

Where there is no cluster, there is no consensus. Where there is no consensus (which will manifest as very divergent branching with very few cycles), it is somewhat unlikely that any true sense of the web's inclusion criterion is understood there. In the least, if there were a true sense, it seems that it would be difficult for a distant judge to recognize it anyway, no near judges have.

Consequently, it is probably acceptable to drop any distant node that cannot be approximated with clusters nodes

Given that, it's hard to imagine a natural web of trust that couldn't be approximated pretty decently with neighborhood approximation via cluster nodes.


Types of approximation

It's worth naming a few different ways of talking about the accuracy of an approximation AG of a web G.

Where C(G) is the set of users considered to be trusted in the neighborhood of the querier, in the web G

(These produce numbers in the unit range, retain meaning wrt to the absolute sizes of the graphs, and generally feel truthy to me)

It seems that different webs will need to focus on minimizing different kinds of error. Minimizing both is ideal, of course, but wherever we face a tradeoff:

I guess I'd recommend just keeping both kinds of approximations in cache, instead of having approximation parameters be part of a web's definition or whatever.

It may be worth measuring the degree of connectedness, roughly the average number of steps it takes to get from any node in the neighborhood to any other node. This will vary. As long as the path is short enough, we can get a pretty good guarantee that all of the nodes in the clique would be included in any query extending far enough through it.


Subproblems

These things all feel solvable to me


See also Making better online spaces with open sets of webs of trust

mako yass

September 2020