MySBDS - Steem Blockchain Data Service in MySQL

in #utopian-io7 years ago (edited)

MySBDS - Steem Blockchain Data Service in MySQL

What is the project about?

The Steem Blockchain Data Service is an open source project managed by Steemit that allows you to read the Steem blockchain and store it in a local MySQL database. The project opens with an obvious Notice:

This is prerelease software, not yet suitable for production use. Proceed at your own risk.

I'm working under this warning to get SBDS operational in the most stable way possible today, while watching and supporting the project as it moves forward. At this point, I've spent 100 hours (that I tracked, so more really) digging into SBDS and working to understand it. I'm not gonna give myself the Complete Understanding Badge yet, but I'm on my way.

MySBDS.com

Today, MySBDS.com and blervin/mysbds exist to provide three key things:

  • Proven deployment scripts to get up and running with SBDS quickly
  • Documentation of the challenges identified thus far (both on Steemit.com) and on MySBDS.com
  • A public endpoint offering an always up-to-date full mysqldump of the entire Steem blockchain

I've dumped a pretty significant amount of time and energy into SBDS and I'm barely scratching the surface so I want to share everything I've found (today and ongoing) to get a dialog going with everyone else who's looking to get a local MySQL database running. And also to provide this public documentation of implementation results for the team working on updating the code for steemit/sbds.

My near-term goal is to get a public instance up and running for those who aren't quite ready to deploy and manage an entire environment themselves, but still wanna dig into the blockchain data.

Technology Stack

Everything outlined here is deployed in docker on CentOS 7.4 using common bash commands.

I will specifically outline a DigitalOcean deployment along with some useful functionality available there.

From what I see around here, it looks like Ubuntu is pretty popular so I'm planning to put together another version for that as well.

Roadmap

In order of complexity (and likely implementation):

  • Prepare MySBDS.com to highlight current scripts, documentation, and the most relevant Steemit posts about SBDS
  • Develop an Ubuntu deployment script
  • Document and outline challenges with deployments
  • Document my monitoring infrastructure
  • Deploy a public MySQL server
  • Explore interest in fully managed deployments or even replication
  • Continue performance monitoring to identify refinements and share those insights

How to contribute?

Comment here and any other posts about SBDS, catch me on steemit.chat, or just let me know how you can help make the blockchain easily accessible in MySQL!

Deploy MySBDS

Overview

I was excited to play with the SteemSQL service that was originally public, so when it recently became a paid service a little beyond my budget I began to look for alternatives and I realized I could easily maintain my own copy of the blockchain.

At the time of this writing, the size of the MySQL database of the entire blockchain is 403GB, and growing rapidly. Because of this growth, I wanted to look at ways to deploy this on flexible infrastructure and I found a solution with Block Storage at DigitalOcean. This allows us to deploy storage at any size for $0.10/GB per month and increase that size at any time later, however, storage can never be decreased.

We'll walk through the steps to create a script that will deploy a cloud server at DigitalOcean running the Steem Blockchain Data Service and MySQL in docker.

Requirements

You will need an account at DigitalOcean to deploy a cloud server and related storage. Also, it is presumed that you are comfortable in the shell and understand how to connect to the server via SSH and perform basic systems admin work on a Linux platform.

Initial Setup

We need to ensure we deploy our droplet in a region that supports block storage. The supported regions for volumes are:

  • NYC1
  • NYC3
  • SFO2
  • FRA1
  • SGP1
  • TOR1
  • BLR1
  • LON1
  • AMS3

You'll need to consider the size of the droplet you want to deploy, or the real question, whether you want to download the full mysqldump of the blockchain. I have written the low_mem.sh and high_mem.sh for both scenarios.

If you want to just get an instance up and slowly download everything, you can deploy a droplet with 4GB of memory and use the low_mem.sh script.

To download and restore the entire blockchain takes a fair amount of resources so the high_mem.sh script is written for 16GB of memory.

I'd give an estimate for times, but in my testing it varies a great deal depending on a lot of factors, including the node we're connecting to grab blocks. In general, it is always many hours but with the low_mem.sh deployment it may easily become a few days before it completes.

Prepare Script

Start by opening your favorite text editor such as Sublime Text or just Notepad and grab either the high_mem.sh or low_mem.sh script.

The first modification is to change the mysql_password in your copy of the script. Make that change and save your copy of the script. We'll modify this a bit more shortly.

mysql_password="mystrongcomplexpassword"

Create Droplet and Volume

We start by clicking the Create button and selecting Droplets

create.png

Then, we select CentOS and 7.4 x64 from the drop-down:

centos.png

As we look to Choose a size you'll notice there are Standard Droplets and Optimized Droplets available. We can later expand our instance, however that expansion is only within this droplet type. You cannot deploy a Standard and later convert it to an Optimized instance, or vice versa.

choose_a_size.png

Next, click Add Volume to add block storage.

add_storage.png

Given the current size of the Steem blockchain, I recommend using at least 500GB, especially if you plan to download the blockchain as you'll need the extra space.

block_storage.png

Next, we'll pick a region.

region.png

As you'll see, only the supported regions are selectable with all others grayed out.

Everything here around the volume name presumes a lot and those presumptions may be wrong. By default, DigitalOcean names a Volume as volume-{region}-{volume_index} where the index is essentially the count of Volumes you have in that region. Our approach here works in most scenarios, but if you have issues mounting your volume just look to the actual name displayed in DigitalOcean once it is created.

That said, we look to the following line in our script setting the volume_name variable:

volume_name="volume-tor1-01"

There are two parts to this name, the region (tor1 here) and the index of the volume (01 in this case) for that region.

If you do not have any volumes in the region you are using, this should deploy with the 01 index. If you have existing volumes in that region, it will increment to the next number. Update the volume_name accordingly.

Next, we check the User data box and paste in our modified script:

additional_options.png

Now, we can scroll down and click the Create button.

finalize.png

The page will refresh and bring us back to the list of droplets and you'll see this one building. Once it's complete, the IP address will appear and you can click the Copy link to grab that and paste into your SSH client such as Putty.

Known Issues

One thing I discovered quickly is that some of the sizes set for database fields are too small and we will see Data too long for column 'raw' errors.

Fortunately, the community stepped in and there is a pull request at github that fixes everything. My scripts here manually checkout that pull request rather than the master branch.

I know, I saw the signs that advised against hiking off-trail, but c'mon, this is an adventure so let's explore and see what we find! Dry humor aside, this branch has proven reliable in my testing so I'm comfortable with it but please advise of any issues you encounter.

The SBDS project still references steemd.steemit.com as the RPC node it connects to, but this was discontinued on January 6, 2018 so I've got a little sed in here to update that to api.steemit.com

Confirm Script Execution

The deploy will take a few minutes and you should check for the install email from DigitalOcean to get the password. Once you are logged in, check the script execution status by tailing the log we created.

tail -fn5 /var/log/mysbds_install.log

Somewhere between a few hours and a few days, you'll finally see:

END install

A simple way to check progress

mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
mysql -h $mysql_ip -p -e "SELECT max(block_num) AS lastblock FROM sbds_core_blocks" steem

And you'll see something like this, with a much smaller block number.

+-----------+
| lastblock |
+-----------+
|  20065928 |
+-----------+

Then, just sit back patiently until everything fully syncs up.

You can watch the head_block_number shown at https://api.steemjs.com/getDynamicGlobalProperties to see how far along you are.

Conclusion

I plan to continue working with SBDS and updating this project with new scripts, tricks, and tools that I discover along the way.

I'm not seeing a whole lot of posts about SBDS so I'm hoping to open the discussion so we can all work together to get this running well. If you're already using SBDS or I just opened this door for you please comment and share your experiences! I'm gonna document everything I personally discover as well as any insights shared by the community.

My goal is to make it as simple as possible to push the blockchain to a MySQL database and I feel like I'm off to a good start.



Posted on Utopian.io - Rewarding Open Source Contributors

Sort:  

Thanks for putting this information together. I've been wanting to get the steem blockchain in mysql for quite a while, and I've got a droplet on digitalocean running your script as I type. I'll be sure to let you know how it goes, but I might not be able to experiment with things again until tomorrow night. I just wanted to quickly say thanks.

Awesome, please follow up and let me know how it goes!

I'm currently on the step "build/run sdbs" and I see the mount is using 437G of space so things seem to be working. I'm having trouble finding out how to access the data in mysql, and I've tried setting up phpmyadmin, but I can only get phpmyadmin working on localhost and I can't seem to get it to working using the network ip. What program are you using the access the mysql data?

Here's a reliable tool you can use to connect: https://www.heidisql.com/download.php

I'm trying to find a way to connect to it using the same droplet on digitalocean. Maybe the problem is that I installed mysql on the droplet to be able to run phpmyadmin. For example, when I type the following command "mysql -u root -p" I can login with nothing as a password and then at the mysql prompt I can type "show databases;" and the only databases shown are information_schema, mysql, and performance_schema, and there is no reference to the steem mysql database. When I vi to /etc/my.cnf I see that the current data directory is "/var/lib/mysql" but I think the data your script generated is found in "/mnt/volume-tor1-01/volumes". I'll try to play around with it more later tonight. My goal is to be able to connect to the steem mysql data with php because that is the language I am most familiar with.

Yeah, installing mysql locally is fine, but might make it a little confusing since you now have two separate mysql instances running, but this should clarify.

To connect from the droplet itself, we need to look to the docker networking to find the IP the mysql container is running at by running this command:

docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql

For example, I get 172.17.0.2 as the IP. Then, you can connect using the IP returned from that command:

mysql -h 172.17.0.2 -u root -p

If you are connecting from any remote system, you can just use the public IP of this droplet with the normal port 3306.

Thanks for all your help. Since the commands you provided didn't work with my dual-mysql-trying-to-get-phpmyadmin-to-work setup, I destroyed the droplet I had been working on started over (I had to start over more than once because I forgot to increment the volume_name but thanks for your note about incrementing the volume name), and now it seems to be working for me like it is for you. I'm pretty excited because it looks like the hurdles have been removed so I can play around with the steem blockchain in mysql. Note: I tried using steemsql.com a while back but I wasn't able to get that to work with php in linux. Also, with the help of @eonwarped simplifying some Github instructions and pasting some commands for me I was able to get a condenser working which was neat to see, but my main motivation with setting up the condenser was because thought I would enable me to have the steem blockchain in mysql, but I couldn't get mysql working with the condenser, and when I saw the size of the condenser installation I doubted if it had all the blockchain data anyway. In short, I really appreciate there work you did here.

Sorry you had to start over but I'm glad you're up and running again. The learning curve here is a little steep but I've ran into most of the problems you'll see so don't hesitate to follow up if you have any other issues.

You got a 3.45% upvote from @ipromote courtesy of @blervin!
If you believe this post is spam or abuse, please report it to our Discord #abuse channel.

If you want to support our Curation Digest or our Spam & Abuse prevention efforts, please vote @themarkymark as witness.

You got a 15.38% upvote from @mercurybot courtesy of @blervin!

You got a 4.62% upvote from @upmewhale courtesy of @blervin!

You got a 6.06% upvote from @upmyvote courtesy of @blervin!
If you believe this post is spam or abuse, please report it to our Discord #abuse channel.

If you want to support our Curation Digest or our Spam & Abuse prevention efforts, please vote @themarkymark as witness.

You got a 2.01% upvote from @buildawhale courtesy of @blervin!
If you believe this post is spam or abuse, please report it to our Discord #abuse channel.

If you want to support our Curation Digest or our Spam & Abuse prevention efforts, please vote @themarkymark as witness.

This post has received a 5.08 % upvote from @boomerang thanks to: @blervin

Your contribution cannot be approved because it does not follow Utopian Rules

Hello, thanks for your post, however, we believe the SBDS is open source and your contribution is two scripts (configuration).

Trivial code snippets, example code or simple templates will not be accepted.

You can contact us on Discord.
[utopian-moderator]

Hi, is the project being continued? People hosting sbds have problem with maintaining it, it brakes all the time.

The project is continuing, but I ran into exactly the problems you're describing.

There has been a lot of work on SBDS over the last few months and there is an open issue to deploy an update, which I hope happens soon.

Basically, I'm dead in the water like everyone else until this gets fixed, but once it does I'll be back online!