You are viewing a single comment's thread from:

RE: Version 2 for scoring the strength of a follower network

in #steemtalk8 months ago (edited)

I like the (new) visualisation. I can imagine it better this way... and test it :-)

A few thoughts on this:
I think with "normalize" you mean the distribution of the values in the interval from 0 to 1. This then creates an arc between x and y-axis, and the radius calculation brings us to the distance between the origin and the point.
I have done this with your values determined here:

UserMedianFollowerRadius
remlaps47.3719660.888
remlaps-lite47.847090.4590
penny4thoughts62.891910.588
steemitblog40.61396521.028
realrobinhood67.22270.656
moecki63.682790.606
moecki.tests65.9420.629

We can see that the differentiation in the middle areas works as desired (remlaps and remlaps-lite, penny4thoughts and realrobinhood). But we also see that in the edge areas (steemitblog) the results are not sensible in my view. Even a median of 26 would still lead to a radius of 1.0001. From a certain point onwards, only the number of followers (spam accounts?) counted. This shows that a sensible upper bound must be found.

Edit: To expand the list, I have checked my values (moecki and moecki.tests). moecki.tests with 2 followers has a higher value than moecki.

Sort:  

From a certain point onwards, only the number of followers (spam accounts?) counted. This shows that a sensible upper bound must be found.

Yeah, this is by design, and as time goes on, I imagine that threshold will need to be tuned repeatedly. Basically, I chose it based on the actual accounts that I'm seeing the autovoter picking out, so if the blockchain grows or usage patterns change, that value will almost certainly need to change, too.

As for spam accounts, they should ideally have 25 reputations, which means that they'll give a 0.01 score, so hopefully that won't be too big of a problem for someone that's followed by sockpuppets. It's easy to add lots of accounts, but it's not so easy to give positive reputations to lots of accounts.. I actually think that the steemitblog score is sensible, since they have such a big network. Sure, there are many low-rep accounts, but there are lots of high rep accounts, too. Something they post gets lots of visibility.

I'm more worried about the low-median value from the opposite direction, i.e. someone could attack someone else's scores by following them with lots of low-rep accounts. (and I don't have a good solution to that problem yet.) One thing I'm thinking of is switching from the total follower count to the number of accounts with >25 reputations. I'm looping through the follower list anyway to get the median, so it wouldn't add much computation to count just the >25 rep accounts.

And yeah, I'm still not happy with the treatment of low follower counts. It's far too easy for an account with low numbers of followers to get a fairly high median value (i.e. the case of moecki vs. moecki.tests). I'm going to keep exploring to see how I can adjust for that.

I already made some adjustments on Thursday after posting, that helped a little bit, but I need to keep exploring. Current parameters are:

median follower rep lower threshold - 30
median follower rep upper threshold - 92
follower count upper threshold - 1600

Maybe I just need to add a lower threshold for follower counts. i.e. if you have less than 10 or 20 followers than you get 0.01. Not sure about that, though. One one hand, it makes sense and adds symmetry, but on the other I would like to avoid adding another "special case".

2nd reply. Something like this might be interesting:

Adjusting the reputation upper threshold based on the number of followers. Also, fixed a bug so that X and Y each max out at 1, while the combined radius maxes out at sqrt(2). That limits the reach of the individual parameters.

Believe it or not, the arcs are still in there, even though they're difficult to see...

Out of time now, so I'll see how it goes and post more about it at some future time.