Steemit Content Analysis
A lot of interesting things have been happening on Steemit lately. Developers are increasingly reporting new developments, authors are announcing new initiatives, the Steemit team is changing the rules in search of a better organization of interaction, etc. It seems that the platform is developing.
I wanted to make sure of this, so I started looking for some tools that I could use to get some statistics, like the number of active authors or something like that. Unfortunately, I didn't find anything. I then asked Gemini how I could learn to get such data directly from the blockchain. His answer let me know that it is not impossible, but at the moment I will not be able to find enough time to learn how to do it. 😉
But, as always, I did not give up so easily. I continued my search and it brought me to Google Academy. You have no idea how much Steemit is an interesting research platform for scientists. I discovered dozens of research papers about Steemit. And they are not necessarily outdated.
I became interested in reading what scientists learned about our platform in 2024, so I filtered the articles and got at least 30 that are dated to the current year. I didn't have access to most (full versions), but some were readable. Scientists investigated various issues. I did not find the statistics I was interested in, but I came across one interesting article: Data collection, preprocessing and descriptive analytics of social media data. I'm sure some of the information from this article will be interesting to you.
This work is devoted to the analysis of data from the Steem blockchain. In particular, the authors compared the number of comments and the number of SPs. According to their results, the more SP a user has, the more they comment.
I can agree with this, because the largest SP owners, who are also the least active, mostly delegate their SP to voting services. Therefore, users with the largest number of active SP are mostly really active in commenting.
Another article presented the results of research on health content: The Effect of Monetary Incentives on Health Care Social Media Content: Study Based on Topic Modeling and Sentiment Analysis. I would like to point out some interesting facts from this article.
Results regarding the emotional coloring of posts in Steemit and Reddit:
Emotion aspects | Steemit posts (N=1000), n | Reddit posts (N=1000), n |
---|---|---|
Joy | 435 | 200 |
Sadness | 276 | 422 |
Fear | 105 | 125 |
Anger | 3 | 22 |
As can be seen from the results, the authors on Steemit much more prefer happy posts, while anger, fear and hatred live in ordinary social networks. In fact, it is noticeable even without an analytical approach.
And now the main thing I wanted to talk about. The researchers also analyzed the quality of content on Steemit and Reddit. For this, they conducted an online survey.
We designed the study so that participants first read the post via a link that brought them to see the post on a third-party website without Steemit or Reddit logos, preventing possible biases in answering questions, and then answered 5 questions (mix of multiple choice and text entry types).
Here's how respondents ranked posts by quality level:
Content quality question | Steemit posts (N=230), n | Reddit posts (N=230), n |
---|---|---|
Poor | 17 | 67 |
Average | 81 | 106 |
Good | 132 | 57 |
To what extent do posts encourage commenting:
Likelihood to comment on or like posts question | Steemit posts (N=230), n | Reddit posts (N=230), n |
---|---|---|
Not likely at all | 55 | 95 |
Neutral | 56 | 65 |
Extremely likely | 119 | 70 |
Subscription probability:
Subscription probability | Steemit (N=230), n (%) | Reddit (N=230), n (%) |
---|---|---|
No | 119 (51.7) | 183 (79.6) |
Yes | 111 (48.3) | 47 (20.4) |
Some conclusions
Talking to some of the users here on Steemit, I have heard more than once that the ability to receive rewards for content has a negative effect on the quality of the authors' content. And indeed, if Steemit is a person's only source of income, then he is forced to write posts every day, or even several posts a day. The quantity definitely reduces the quality of the posts.
However, as the mentioned studies have shown, if we apply a more analytical approach to this issue, we see that in general, the possibility of receiving rewards induces authors to produce higher quality content. Indeed, when an author posts something on Reddit or Facebook, their goal is to provoke interaction, not to convey a certain point. That is why "normal" social networks are dominated by memes and various mockery. In contrast, on Steemit, people avoid content that may cause negative emotions for fear of losing their earnings.
Questions to developers
I was just wondering if there was a way to get information like this post. It would be interesting to learn something similar information for the last 2-3 years.
Sources (https://creativecommons.org/licenses/by/4.0/):
That's some very interesting information you came up with... and fascinating that 3rd parties are actively looking at and comparing Steemit.
And nice to see that the community has what might be described as a "positive slant." There's enough griping and anger elsewhere in the world!
I have a feeling the person who might best be able to answer your question about blockchain data and queries is @steemchiller, since steemworld.org pulls massive amounts of information to make the site highly informative and usable. Not suggesting he's going to do it for you, but possibly he could point you in the right direction?
A little off-topic, but related to your ongoing advertising and marketing efforts... what about just a public relations team? As in, someone(s) who is constantly sending news releases to every conceivable crypto news service, acting as liaison to exchanges, updating the "project news" sections on CoinGecko, Coinmarketcap and so forth.
Often you get more reach from people reading about something than from seeing actual advertising.
Does the Steemit team have such a person(s)?
I already received a reply from @moecki and he described to me what the difficulties are and what are the options for obtaining such information.
When it comes to the Steemit team, it's hard for me to comment because of the veil of secrecy that surrounds it. However, since I signed up to Steemit, I have never seen any press releases or just news about Steem. Therefore, I can assume that there is no public relations team.
Interesting coincidence, but I thought the same thing as you and even did a little research. If you follow the link, you will see the prices for publishing articles and press releases in some mass media. I don't even know if it will pay off.
Hi @o1eh
What an interesting article. I was struck by the way Steemmit publications are appearing in specialized search engines and that they are being used as reference sources in various investigations.
I have a recent experience, a journalism student at a Venezuelan University contacted me on my WhatsApp.
What caught my attention was that she found through Google, a publication of my authorship about beliefs according to Neurolinguistic Programming, which I made in 2018.
The surprise was great, the girl contacted me because my cell phone number appears in a banner at the end of my publications.
We talked for a while and I helped her "as an expert" in relation to the subject of her degree thesis that she is starting.
I share some screenshots of the WhatsApp and a translation:
Other capture whitout traduction.
This is very interesting. Recently, I was also approached by a guy as an "expert" on Steemit 😄. He wanted to promote his business on Steemit, but he was only interested in the Ukrainian audience, which is not very large here.
I think Steemit would gain a lot if links to posts appeared in Google search results. But apparently Google doesn't take kindly to content posted on Steemit. You can mostly find older posts in the search.
THE QUEST TEAM has supported your comment. We support quality posts, good comments anywhere, and any tags
Interesting! Especially the comparison with Reddit. I haven't looked at this in detail yet, but the results at least seem to be positive for Steem. I'm also pleased that research is currently looking into Steem. That sounds like a market presence, doesn't it?
That shouldn't be difficult. The data collection would perhaps take some time, because the relevant blocks would probably have to be run through sequentially.
A separate database would probably be necessary, especially if you wanted to update the results on an ongoing basis.
I think the advantage of Steem is that it is the first blockchain of a social type. Therefore, here you can consider the data for a longer period of time.
Certainly. I was pleasantly surprised by this fact.
I was hoping that such a database existed, since such a database had existed before, probably before Hive.
Not necessarily only before Hive. As I already mentioned to remlaps, there is Hivemind. Please take a look at my suggestion there.
Before the schism, we had steemSQL, which would have made this trivial. Someone was taking most/all of the blockchain data and dumping it into SQL tables that were exposed to the Internet. Post-2020, I have often wished for a service like that to be recreated. Even before 2020, I was priced out of it by subscription fees, though. Presumably, the data and bandwidth costs are not negligible.
Good keyword. We actually have an SQL database with a lot of information from the blockchain: Hivemind.
SteemSQL or now HiveSQL is probably not open source. In any case, I haven't found a repo for it. Do you remember how intensively the service was used?
It was widely known, but I don't know how much usage it got. My impression is that it got a decent amount, but I have no justification for that. I think you're right that it is not open source, although I suspect maybe it evolved from this Update and/or this, "Ingest, prepare, store, and index blocks in one of two storage backends (S3, SQL Database, and/or Elasticsearch)."
Personally, I probably used it a few times per week when it it was free, but I stopped when he started charging for a subscription. The subscription wasn't really all that high, maybe 10 or 20 STEEM per month, but I couldn't justify it to myself.
Yep. Near the end, there was also an internet-reachable hivemind node (with access limited to a small number of people), but I never figured out how to make use of it before the split happened and it went away.
Thanks for the links.
SteemDB also creates a database, but I don't think you can use it for this either.
Steemchiller also continuously builds a database from the blocks, which can be queried with SDS. However, I am not aware of any query that could simply be used for this.
I was wondering if I could make a query on my Hivemind test node.
@o1eh: Which data would help you? It would certainly have to be grouped by day or month so that the data can be displayed in a meaningful way.
post_count from all authors grouped by day
post_count from unique authors grouped by day
or unique_author_count which wrote a post grouped by day
I would be interested to know how activity in Steem changes over time. How to measure it? I think you are absolutely right about this:
I don't know what the possibilities of getting the data might be, how long it will take. Maybe it's worth grouping the data by months instead of days to reduce the number of queries?
It would be nice to know the number of new accounts registered per day or per month.
It would be interesting to receive the following information:
But, of course, it all depends on the possibilities. Even data on one of the mentioned queries would be extremely interesting.
Okay, I got the accounts per day for the last 6 years. This was easy. I send you a csv file if I have all data.
But how are the others defined?
active voter is a user, who voted at least once a day/month?
active commenter is a user, who created at least a comment on a day/month?
active author is a user, who created at least a post (and comment?) on a day/month?
I think the number of posts created on a day/month is also interesting (I got it too :-) ).
You are a wizard! I am very glad, I think it will be interesting information. 🥳
Fantasy! You are a wizard of at least 14th level!
I think you indicated everything correctly. If a person voted at least once today, he can be considered an active voter. The same with comments and posts.
Sorry for the double-entry. Editing error. Disregard this one.
This is fascinating. It's a pleasant surprise to still see articles or research papers written on Steem. I thought it stopped happening after the split.
I would dig into the referenced article in more detail at a leisure time.
The comparison is awesome though. I'm grinning at the results as if those are my personal scores in a test. 😅
Google Academy has really many articles dedicated to Steem. Some are somewhat difficult to understand, others are not accessible, but there is a lot of information, and therefore it takes a lot of time 🙂.
I will check it out.
Thank you for sharing this.
This is fascinating. Especially the sections on emotional coloring and post quality. I guess the emotional coloring section wasn't too surprising, but the quality section did surprise me. I don't use reddit, though, so my comparison would have been something like Medium or Substack.
I've been trying to use Reddit for a while and that's where I first found out about Steemit. Unfortunately, Reddit has banned my account for no apparent reason. I created another one, but history repeated itself.
I can't adequately rate the quality of the content on Reddit, as I have the impression that it is an information cesspool. But if I had better understood this social network, I probably would have been able to find some useful and informative posts.
UPD. You will be interested in: https://dl.acm.org/doi/abs/10.1145/3689431
Thanks! I started reading, but got annoyed by the bias in the background section. I do want to see the results, though, so I'll try again later when I'm not on my cell phone.
My impression of reddit is similar to yours. I'm always surprised that people are still using the platform.
Very interesting post. Now what I intuitively felt (audience loyalty and much higher quality content than in other social networks) you confirmed with these posts.
I hope you will find the tools and be able to please your readers with high-quality analyzes of the received information.
Судячи з відповідей програмістів, це буде непросто. Але з часом можливо щось вийде.
Our search engine. 👎👎
In relation to the question/request you make to the Steemit developers, I would include a great improvement in the internal search system of the platform, locating a specific article from Steemit is an almost impossible challenge to achieve.
Oh yes, this question has already been raised. Thank you for reminding me.
Very interesting data. And I agree with the quoted line here from your post. Thing is, yes, the possibility of receiving the reward DOES induce the author to produce better content, but if the category that the overall group is getting awarded from is low-level content to begin with, they'll just produce more of it. The idea should be to expand the content as you progress as a platform.