You are viewing a single comment's thread from:

RE: Steemit Content Analysis

in Popular STEM4 months ago

A separate database would probably be necessary, especially if you wanted to update the results on an ongoing basis.

Before the schism, we had steemSQL, which would have made this trivial. Someone was taking most/all of the blockchain data and dumping it into SQL tables that were exposed to the Internet. Post-2020, I have often wished for a service like that to be recreated. Even before 2020, I was priced out of it by subscription fees, though. Presumably, the data and bandwidth costs are not negligible.

Sort:  

Good keyword. We actually have an SQL database with a lot of information from the blockchain: Hivemind.

Presumably, the data and bandwidth costs are not negligible.

SteemSQL or now HiveSQL is probably not open source. In any case, I haven't found a repo for it. Do you remember how intensively the service was used?

 4 months ago 

Do you remember how intensively the service was used?

It was widely known, but I don't know how much usage it got. My impression is that it got a decent amount, but I have no justification for that. I think you're right that it is not open source, although I suspect maybe it evolved from this Update and/or this, "Ingest, prepare, store, and index blocks in one of two storage backends (S3, SQL Database, and/or Elasticsearch)."

Personally, I probably used it a few times per week when it it was free, but I stopped when he started charging for a subscription. The subscription wasn't really all that high, maybe 10 or 20 STEEM per month, but I couldn't justify it to myself.

We actually have an SQL database with a lot of information from the blockchain: Hivemind.

Yep. Near the end, there was also an internet-reachable hivemind node (with access limited to a small number of people), but I never figured out how to make use of it before the split happened and it went away.

Thanks for the links.
SteemDB also creates a database, but I don't think you can use it for this either.
Steemchiller also continuously builds a database from the blocks, which can be queried with SDS. However, I am not aware of any query that could simply be used for this.

I was wondering if I could make a query on my Hivemind test node.
@o1eh: Which data would help you? It would certainly have to be grouped by day or month so that the data can be displayed in a meaningful way.

post_count from all authors grouped by day
post_count from unique authors grouped by day
or unique_author_count which wrote a post grouped by day

 4 months ago 

I would be interested to know how activity in Steem changes over time. How to measure it? I think you are absolutely right about this:

post_count from all authors grouped by day
post_count from unique authors grouped by day
or unique_author_count which wrote a post grouped by day

I don't know what the possibilities of getting the data might be, how long it will take. Maybe it's worth grouping the data by months instead of days to reduce the number of queries?

It would be nice to know the number of new accounts registered per day or per month.

It would be interesting to receive the following information:

image.png

But, of course, it all depends on the possibilities. Even data on one of the mentioned queries would be extremely interesting.

Okay, I got the accounts per day for the last 6 years. This was easy. I send you a csv file if I have all data.

But how are the others defined?
active voter is a user, who voted at least once a day/month?
active commenter is a user, who created at least a comment on a day/month?
active author is a user, who created at least a post (and comment?) on a day/month?

I think the number of posts created on a day/month is also interesting (I got it too :-) ).

 4 months ago 

Okay, I got the accounts per day for the last 6 years.

You are a wizard! I am very glad, I think it will be interesting information. 🥳

I got it too :-)

Fantasy! You are a wizard of at least 14th level!

But how are the others defined?

I think you indicated everything correctly. If a person voted at least once today, he can be considered an active voter. The same with comments and posts.

I send you the csv-files for each request after sending this comment.

So that you can categorise the data, I would like to explain the conditions that were used in the SQL query:

Accounts per day: Number of accounts created per day (file: daily_account_creation.csv)
Posts per day: Number of posts (without comments) created per day; also deleted/muted posts (file: daily_post_creation.csv)
Commentator per day: Number of authors per day who have written at least one comment; also deleted/muted comments; unique authors (file: daily_commentator_count.csv)
Author per day: Number of authors per day who have written at least one post; also deleted/muted posts; unique authors (file: daily_author_count.csv)

Unfortunately, there is no way (without big effort) to determine the number of voters per day. The votes are not stored separately in a table, but only together with other data.

I hope you can prepare the data accordingly and give us an analysis :-)

 4 months ago 
 4 months ago (edited)

Sorry for the double-entry. Editing error. Disregard this one.