You are viewing a single comment's thread from:

RE: [Steem Rep] Update - September 2024 | AI-Comments | Tags | Trendings Scores

in Steem POD Team2 months ago (edited)

By the way, I have now started with the first attempts.
You can also compare steemit.com and https://steemit.steemnet.org/trending. The full bot list from the-gorilla is currently filtered out there.

I have experimented locally with a variant that weights the rshares according to time (under 5 minutes = 0, between 5 and 10 minutes linearly increasing, from 10 minutes = 1). This shows even more significant differences.
I will compare the variants mentioned above.
You can also see the score in the PostSummary header on my test page. Overall, the values differ only very minimally.

Edit: new URL to use caching.

Sort:  

I see the scores in your test page, but I'm still seeing posts that were voted by accounts that I would've expected to be filtered as bots. Maybe I'm misunderstanding the filtering.

You probably assumed that posts with bot votes would be filtered out completely.
That was not my intention. I ‘only’ wanted to influence the calculation of the score by either filtering out or weakening the bot votes (i.e. their rshares) as far as possible. That way, the posts with bot votes should not end up so high in the ranking.
But as I suspected, there are still too many other (trail) votes that distort the result.

I have now calculated with various other methods:

  • Filtering out the bot votes doesn't bring much change,
  • Using the median rshares results in the biggest changes,

I would implement the median calculation in the test environment as a test.

I had a long conversation with ChatGPT today in which the use of percentiles emerged as quite promising to identify the biggest outlier votes. A weakening could then be applied to these.

Ah, you're right. I was assuming it would be total filtering. But your approach makes more sense. I suppose they should get credit for legitimate organic votes, even if they're using bots.

Using the median rshares results in the biggest changes,

I would implement the median calculation in the test environment as a test.

That was my gut feeling. I look forward to seeing that. My only reservation is for posts with a small number of votes. I guess the median would need to be combined with some sort of minimum number of votes and/or rshares.

I had a long conversation with ChatGPT today in which the use of percentiles emerged as quite promising to identify the biggest outlier votes.

That makes a lot of sense. It hadn't crossed my mind, but I like it.

 2 months ago (edited)

My only reservation is for posts with a small number of votes. I guess the median would need to be combined with some sort of minimum number of votes and/or rshares.

That's true. I have already observed that posts with one vote had a higher trending score than posts with many (good) votes. A sensible combination is absolutely essential.
Unfortunately, the ‘old’ scores are higher, so the trending page is still dominated by them...

It looks very different, though. I think there's a little bit of improvement, already.

I have already observed that posts with one vote had a higher trending score than posts with many (good) votes. A sensible combination is absolutely essential.

Forgot to mention: This was why I came up with median * log(number of votes), too. Someone could try to manipulate the score with fake accounts, but it gets expensive to fund them to high enough levels that they won't drag down the median.

To prevent the curve from rising too quickly with smaller vote numbers, I have now used log10. We'll see how the values develop.

When debugging the new function, I noticed that there is a bug in the caching wrapper that makes the caching unusable (https://github.com/steemit/hivemind/issues/338).
I would fix that now :-)

To prevent the curve from rising too quickly with smaller vote numbers, I have now used log10. We'll see how the values develop.

This is fascinating. First, I was able to knock a post with low numbers of votes off of your trending page just by voting for it with a small value, so that's an interesting dynamic. It actually gives an advantage to people who want to reduce a post's visibility (with low numbers of votes), and this has ramifications that I hadn't considered for alt-accounts. Not sure if that's a problem - or how big of a problem. Definitely something to be aware of, though.

Second, and more importantly, does this mean that if 10 web sites all run condenser and hivemind, they can use 10 different customized trending algorithms in order to distinguish themselves? That's a competitive dynamic that I was not aware of.

First, I was able to knock a post with low numbers of votes off of your trending page just by voting for it with a small value

Yes, I've already done that to see how the score changes after a second (small) vote :-)
However, the influence is only apparent and only due to the fact that I recently changed the trending score calculation and that the recalculation only happens when the post or its votes have been changed.
The goal we wanted to achieve with the last change, to weaken the median with fewer votes, is thus achieved. Unfortunately, we won't be able to see for a few days whether it is really effective in comparison with other posts.

does this mean that if 10 web sites all run condenser and hivemind, they can use 10 different customized trending algorithms in order to distinguish themselves?

Basically yes.
Hivemind provides the trending score for the Steemit condenser. If other frontends also use the Hivemind data, they receive the same score.
There is also a calculation directly in BC Code in tags_api).
Frontend/Backend developers could also use other algorithms. For example, Steemchiller has its own trendings: