Version 2 for scoring the strength of a follower network

remlaps-lite (77)in #steemtalk • 9 months ago

Back in Programming Diary #19, I showed a visualization of the follower network score that I had implemented in my autovoter. I also mentioned that I still wanted to improve it.

Well, I've been chatting with ChatGPT about how to improve it, and I think we came up with a better way.

Here's the new visualization as a "heat map" (constructed using numpy and pyplot):

Basically, with ChatGPT's help, I realized that it's just a geometry problem, so I was able to make the visual look more reasonable by thinking of concentric circles (or ovals) and following some simple steps:

Place the origin at 25 median follower reputation with 0 followers.
Normalize the number of followers to a fraction of the value between 0 and 2400 (2400 is an arbitrary number and can change as needed).
Normalize the median follower reputation to a fraction of the value between 25 and 90 (90 is also arbitrary and modifiable).
Calculate the distance from the origin using the normalized values (the radius of a circle is the square root of ( X² + Y²)).

Thus,

If the median follower reputation is 25 or below, the follower network score is 0.01 (not 0 to avoid dividing by zero later)
If the median follower reputation is above 25, then the follower network score lies along an arc that's determined by the median follower reputation and the number of followers.
Low scores (near 0) are in the lower-left of the quadrant, close to the origin.
High scores (near 1) are reached by moving upwards and/or rightwards.
It can always be adjusted if common reputation values or follower counts grow to new levels.
Adding large numbers of accounts as followers doesn't help the score unless those accounts have managed to raise their reputation above 25 - the default value. (In fact, it probably hurts the score by lowering the median)

Sorry if this was rushed, but that needs to be it for tonight. I'm low on time now and out of pocket this weekend, so I'm not sure when I'll be posting again. However, I wanted to get this up while it was fresh in my mind.

Despite the fact that I haven't made time for programming diary posts, I'm still plugging away at getting the Steemometer ready to release as Open Source, and I'm still thinking about how to measure the strength of someone's follower network. Hopefully, this is a solid step forward on the second item.

Finally, just for reference, here was the previous visualization:

The two would be easier to compare if I had done it as a heatmap, but it didn't cross my mind at the time. I definitely think the new values are more reasonable, though.

Thank you for your time and attention.

As a general rule, I up-vote comments that demonstrate "proof of reading".

Steve Palmer is an IT professional with three decades of professional experience in data communications and information systems. He holds a bachelor's degree in mathematics, a master's degree in computer science, and a master's degree in information systems and technology management. He has been awarded 3 US patents.

^{Pixabay license, source}

Reminder

Visit the /promoted page and #burnsteem25 to support the inflation-fighters who are helping to enable decentralized regulation of Steem token supply growth.

#programming #steemexclusive #burnsteem25 #metrics

9 months ago in #steemtalk by remlaps-lite (77)

$16.34

Sort:

Trending

[-]

moecki (72) 9 months ago (edited)

I like the (new) visualisation. I can imagine it better this way... and test it :-)

A few thoughts on this:
I think with "normalize" you mean the distribution of the values in the interval from 0 to 1. This then creates an arc between x and y-axis, and the radius calculation brings us to the distance between the origin and the point.
I have done this with your values determined here:

User	Median	Follower	Radius
remlaps	47.37	1966	0.888
remlaps-lite	47.84	709	0.4590
penny4thoughts	62.89	191	0.588
steemitblog	40.61	39652	1.028
realrobinhood	67.2	227	0.656
moecki	63.68	279	0.606
moecki.tests	65.94	2	0.629

We can see that the differentiation in the middle areas works as desired (remlaps and remlaps-lite, penny4thoughts and realrobinhood). But we also see that in the edge areas (steemitblog) the results are not sensible in my view. Even a median of 26 would still lead to a radius of 1.0001. From a certain point onwards, only the number of followers (spam accounts?) counted. This shows that a sensible upper bound must be found.

Edit: To expand the list, I have checked my values (moecki and moecki.tests). moecki.tests with 2 followers has a higher value than moecki.

$0.20

4 votes

[-]

remlaps-lite (77) 9 months ago (edited)

From a certain point onwards, only the number of followers (spam accounts?) counted. This shows that a sensible upper bound must be found.

Yeah, this is by design, and as time goes on, I imagine that threshold will need to be tuned repeatedly. Basically, I chose it based on the actual accounts that I'm seeing the autovoter picking out, so if the blockchain grows or usage patterns change, that value will almost certainly need to change, too.

As for spam accounts, they should ideally have 25 reputations, which means that they'll give a 0.01 score, so hopefully that won't be too big of a problem for someone that's followed by sockpuppets. It's easy to add lots of accounts, but it's not so easy to give positive reputations to lots of accounts.. I actually think that the steemitblog score is sensible, since they have such a big network. Sure, there are many low-rep accounts, but there are lots of high rep accounts, too. Something they post gets lots of visibility.

I'm more worried about the low-median value from the opposite direction, i.e. someone could attack someone else's scores by following them with lots of low-rep accounts. (and I don't have a good solution to that problem yet.) One thing I'm thinking of is switching from the total follower count to the number of accounts with >25 reputations. I'm looping through the follower list anyway to get the median, so it wouldn't add much computation to count just the >25 rep accounts.

And yeah, I'm still not happy with the treatment of low follower counts. It's far too easy for an account with low numbers of followers to get a fairly high median value (i.e. the case of moecki vs. moecki.tests). I'm going to keep exploring to see how I can adjust for that.

I already made some adjustments on Thursday after posting, that helped a little bit, but I need to keep exploring. Current parameters are:

median follower rep lower threshold - 30
median follower rep upper threshold - 92
follower count upper threshold - 1600

Maybe I just need to add a lower threshold for follower counts. i.e. if you have less than 10 or 20 followers than you get 0.01. Not sure about that, though. One one hand, it makes sense and adds symmetry, but on the other I would like to avoid adding another "special case".

$0.21

1 vote

[-]

remlaps-lite (77) 9 months ago

2nd reply. Something like this might be interesting:

Adjusting the reputation upper threshold based on the number of followers. Also, fixed a bug so that X and Y each max out at 1, while the combined radius maxes out at sqrt(2). That limits the reach of the individual parameters.

Believe it or not, the arcs are still in there, even though they're difficult to see...

Out of time now, so I'll see how it goes and post more about it at some future time.

$0.14

1 vote

[-]