Analysis : Posts created per day and Organic Ratio

in #utopian-io6 years ago

Repository

https://github.com/steemit/steem

Introduction


Hearing the call of @eastmael in Utopian discord channel, I started to prepare necessary scripts and extract data from Steem block-chain.

This analysis is searching for answers to the following questions :

  • How many posts are posted to Steemit in an average day?
  • How many of these posts are organic ( no bid-bot upvoted )

Scope of Analysis

  • Analysis date: 27 May 2018
  • Analysis timeframe: 2018-05-21T13:42:39 --> 2018-05-22T13:30:15
  • Post Quantity: 40 000 posts
  • Bots list: All the upvote bots registered to SteemBotTracker by @yabapmatt (99 bots)
    bot list
  • All times are in UTC.

Tools

  • Finding the posts

A javascript code is written using steemjsAPI to find the posts that is created 5 days ago.
5 days is chosen because :

  1. If the posts are paid, the script used doesn't show them, so they must be pending payout status posts
  2. If the posts are recent, the author can still buy votes from bid-bots and this will give a false result
  3. Mainly none of the major bid-bots accept posts older than 3.5 days, so 5 days is safe.
function search(){
     var query = {
        start_author:"adsactly", 
        start_permlink:"adsactly-tech-news-does-increased-access-to-technology-increase-societal-inequality",
        limit: 100
    };
    steem.api.getDiscussionsByCreated(query, function(err, result) {
        console.log(err, result);
        console.log(query);
        postarray=postarray.concat(result);
        calculate_rest();
    });
}

With above script, starting from any post that is 5 days old, following 100 posts are taken sorted accroding to their posting time.

It is observed that 100 posts are created in approximately 3.5 minutes.
So to cover a range of 24 hours, it is calculated that approximately 40 000 posts must be analyzed.

Below script is written to extend the search to 40 000 posts ( we can get posts in packs of 100's so 400 API request have been sent )

 function calculate_rest(){
        count=(i*100)-1;
        if(i<400){
            



            var query = {
                tag: "",
                start_author:postarray[count].author,
                start_permlink:postarray[count].permlink,           
                limit: 100
            };

            steem.api.getDiscussionsByCreatedAsync(query, function(err, result) {
                //console.log(err);
                if(!err){
                    postarray=postarray.concat(result);
                    i+=1;
                    calculate_rest();
                }else{
                    calculate_rest();
                }
                
                
            });   
            
    }else{calculate_bots();}
    console.log(postarray);     
        }

It has been some time to process this in javascript.
As the 40 000 posts are received it is observed that the calculation was valid, 40 000 posts covered exactly 24 hours time frame.
The first post is at 2018-05-21T13:42:39 and the last post is at 2018-05-22T13:30:15.
Missing only 12 minutes in 24 hours.

The result is put in the Google spreadsheet

  • Eliminate the posts that used bid-bots

Using the bot list all the posts are filtered if any of the voter is in the botlist with below code.

function calculate_bots(){
var author=[];
var permlink=[];
var time=[];
var activevotes=[];
var control;
    console.log(postarray);
      for (let i = 0; i < postarray.length; i++) { 
      control=0;
      author.push(postarray[i].author);
      permlink.push(postarray[i].permlink);
      time.push(postarray[i].created);
         for (let j = 0; j < postarray[i].active_votes.length; j++) {
         activevotes = postarray[i].active_votes; 
         if (bots.includes(activevotes[j].voter)) {
            control = 1;
          }
         
          }
         if (control==0){
            postarraybot.push(postarray[i]);
         }
      
    }
    console.log(author);
    console.log(permlink);
    console.log(time);
    console.log(postarraybot);
    
}

Now we have the organic posts list that hasn't been upvoted by any of the bid-bots.

Results

Analyzing the 40 000 posts, the following results are found

  • 40 000 posts are posted in 24 hours (- 12 minutes)

  • 40 000 posts are posted by 20 362 users ( average 2 posts per author )

  • The *rush-hours and slow-hours for posts are as follows

    Rush-hours are from 15:30 to 19:30 UTC with 2000+ posts / hour
    Slow-hours are 22:30 to 02:30 UTC with less than 1400 posts/hour

  • With an average of 2 post/day the distribution of posting frequency of uses are as follows.

There is even one user with 256 posts/day.... just crazy!

  • Among 40 000 posts, the posts that got upvote from bid-bots are 2959 posts. ( in-line with previous findings).
    This is %7 of all the posts.

Contact

Links

SteemJs GitHub
SpreadSheet for Data

Sort:  

Hi @firedream, great work! However, I think the results have to be taken with a grain of salt. Around 24h is a rather short period, given that most users (at least those with original content and a life outside of Steem :) ) probably post less frequently than once per day. I know this is rather cumbersome and time consuming with direct blockchain access like SteemJS, but maybe SteemSQL could help to get a broader picture? Additionally, you've discovered quite a number of bots spaming the blockchain posting every couple of minutes. Those typically aren't here for the community, vote for themselves and optionally with a network of voting-accounts. Is this the kind of posts that should appear as "organic"? Distinguishing those from real/human/original/organic/... non-bid-bot-voted posts is difficult and I honestly don't know if any cut could filter those out reliably. If you have any idea, let me know!

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

@crokkon, thank you for your comments.
I completely agree with you on SteemSql usage for data extraction.
I realized how torture it is to me and to my computer to extract data from blockchain with SteemJs :)
Next analysis will be with SteemSql.(let's support arcange :) )
For the organic, the issue is really getting more and more complicated.
Until now, I was thinking of organic as not taking hormones(bot votes) for growth.
But now it seems, organic should be defined from the seed, the source of the post.

What I see is, these bot-seeded posts do not grow that much (yet).
We have to observe and find a common characteristic of these.
I have checked the posting authorities if I can find similarities because if they are not posting themselves, a bot must be posting for them.
No similarities so they should have directly have given the posting key to a bot.
I will dive deeper and keep you informed.
FD.

Hey @crokkon
Here's a tip for your valuable feedback! @Utopian-io loves and incentivises informative comments.

Contributing on Utopian
Learn how to contribute on our website.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

20.000 unique users each day; and probably 25.000 unique users over the course of a week. A lot of bots not posting and some bots posting content every 15 minutes: something like that. Steem needs to get things moving ... launched two years ago ... Youtube is getting worse each day ... there needs to be some major new use cases to bring up daily use to at least a hundred thousand. (Whoever does that can fairly say to have more than quadrupled the user retention of one of the major social media platforms.) Only when there are sufficiently many users can spam and hugger mugger become an insignificant part of the platform; same systems behave differently at different scales.

Imagine two games where, when two individual meet, they do something. So in the first game they give each other cake. In the second game they punch each other. Else they're all randomly walking.

As you get more users the difference in outcome in the two games becomes greater. One has more persons eating more cake; more happiness. The other has more persons getting hurt; more unhappiness. But keep adding persons playing each game and they become more and more the same. Despite the major difference in game dynamics.

Everybody is piling up and probability of having to feed or punch somebody who has to feed or punch somebody and get fed or punched first goes up for an increasing number of players faster than number of players grows. Which paradoxically results in very large games almost nobody being fed and almost nobody being punched; just millions standing around waiting. The same.

The change in outcome contributed by change of scale, or buffering of outcomes, or ..., may be greater than change in outcome contributed by change of all other dynamical variables or even transformation and change of the system dynamics.

I think Steem would rapidly improve if there were 10 million accounts and at least 100 thousands unique users every day.

Will make a post on this subject now that @firedream gathered data and provided tools. Maybe do some modeling. Let's see if we can predict what current scale contributes to different parts of the typical user experience.

@Tibra;
There is also another point.
Think that all the 20 000 users are non-spammers and original content creators.
Nobody is using bid-bots.
The steem created per day for the reward pool is approximately 50 000 Steem/day.
Which means, 40 000 posts will receive 1.25 Steem (3.75 SBD with 3.00 feed-price) on average.

We also know that bid-bots use 12 000 steem per day to give to approximately 3000 posts.
This leaves us to 38 000 steem in reward pool for 37 000 posts further decreasing the average pay out to 1 Steem per post.

Can you see the noise? These 37 000 posts each day must be heard by whales or dolphins to have a meaningful reward.

It is an issue to further think how power can be given to community so that every good post has an equal chance of being rewarded.

FD.

Hey @tibra
Here's a tip for your valuable feedback! @Utopian-io loves and incentivises informative comments.

Contributing on Utopian
Learn how to contribute on our website.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Ty

Will check out the Discord.

256 posts a day, that would equal 10 per hour, or roughly one post every 6 minutes.
Thinking that one person sleeps 8 hours, he'd post once every 4 minutes.

Are comments included?
Or, are we talking about just blog-posts?
The code seems so ambiguous (to me) that a comment would "technically" still be it's own post.

@serylt;
The comments are not included, this is only the blog posts.
With comments, the quantity is much more.

This also was unbelievable to me and I verified the result from steemdb.

Result is correct...approximately 260 posts per day.

FD.

I was intrigued by how one account could post so much. It is a new agency, that is using steempress to post on Steemit.

@emergehealthier, thank you for clarification. I was sure it was some kind of bot, indeed it is...
FD.

I totally confess to skipping your working methodology and heading straight for the results. :) Wow. 40,000 1 every 12 minutes. Gonna head back and check out the rush hours and slow hours a little more. :) Nice job @firedream

Hey @firedream
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Contributing on Utopian
Learn how to contribute on our website or by watching this tutorial on Youtube.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!