Parsing the Steem Blockchain

in #introduceyourself7 years ago (edited)

Hello there!

They call me CryptoBills and this is my first post on SteemIt. I'm a software developer based in the bay area and have just recently gotten interested in the Blockchain and in building apps on it. That's when I came across Steem and saw that it was a perfect entry point into learning more about building crypto apps. Not only that, looks like there is an awesome community here and I am looking forward to being a part of it.

Before we get to deep

Its gonna get technical past this point, read on if you dare. Techs involved are Docker, Python, Ubuntu, and MongoDB.

What we are building

Part of what I am thinking of doing requires me to have ready access to the data in Steem for posts and comments, so I figured why not build a Steem block parser as an experiment and a first step into learning more about Steem development. I got a parser setup and parsing without too much hastle and now I'd like to share my finding with you guys.

Project goals

Basically we want to accomplish a few simple objectives

  1. Connect to the blockchain
  2. Grab the data from the chain
  3. Take said data and dump it into a database

I'll spare you the details but I went through a bunch of iterations before a found a system that I thought worked well enough.

I chose to use docker as a server for this experiment. This will allow us to have full control over our environment as I found that there are some specifics to working with the steem-python lib. Mainly that it has to be used with python 3.6. If you aren't familiar with docker, there is another post on SteemIt that explains it pretty well.

In a nutshell, you want to go to the docker site and download and install docker for your OS. Then you want to run the docker app, and then install kitematic.

Screen Shot 2017-12-13 at 8.24.15 PM.png

Once you install docker, you want to get the latest ubuntu. The easiest way I found to do this is to run the following in the terminal

docker run -it ubuntu:rolling

This will init the docker instance and auto connect you to it.

Note Should you disconnect and need to re-connect, you can do it in Kitematic by finding your created docker instance in the list. Note the name as if you accidentally create another instance it will have a new name.

Screen Shot 2017-12-13 at 8.32.14 PM.png

As you can see I have two instances, but the one I want is at the bottom.

Ok, now in your docker terminal (launch from Kitematic if you are not connected already), you will want to run the following commands.

apt-get update && apt-get -y upgrade
apt-get install -y python3 python3-pip build-essential libssl-dev libffi-dev python3-dev pandoc
apt-get install -y python3-venv
apt-get install -y tmux vim
cd root
pyvenv steem
source steem/bin/activate
pip install wheel
pip install pytest
pip install steem

This installs all the dependencies we need to get started. Note that vim is what I use to edit code within a linux environment, tmux is what I use to see multiple windows inside of the one terminal.

(Also note I am assuming you know how to use linux because your are still here :) )

So luckily, there are already libraries available that let us connect to the Steem chain without too much hassle. I chose to use Python and the steem-python. (Be sure to thumb through the docs as I won't go into too much detail on the code)

from steem import Steem
from steem.blockchain import Blockchain
import pprint
import pymongo
from pymongo import MongoClient
import math

# setup steem access
s = Steem()

# setup the db and collection access
client = MongoClient('docker.for.mac.localhost', 32779)
db = client['steem']

# used to add block to db
last = 0
percent_done = 0
start = 0
step = 100
total = s.last_irreversible_block_num - start
print('parsing %i blocks' % total)

for i in range(start, total, step):
    blocks = s.get_blocks_range(i, i + step)

    # update display
    new_percent_done = math.floor(i / total * 100)
    if new_percent_done != percent_done:
        # keep track of completion rate
        print('%i percent done, %i to go' % (new_percent_done, total - i))
        percent_done = new_percent_done

    # iterate over the blocks and add our id
    for j in range(len(blocks)):
        b = blocks[j]
        if 'block_id' in b:
            _id = b['block_id']
            blocks[j]['_id'] = _id

    try:
        # if we fail, just skip
        db.blocks.insert_many(blocks, bypass_document_validation=0, ordered=False)
    except:
        pass

    last = i

And that's that! I was surprised at how few lines of code it took to get the whole thing working on the Python side. The python-steem lib is really useful for parsing the blockchain. Hopefully someone found this useful, as it took me some time to figure out the specifics. Especially the docker setup for Python. Looking forward to getting deep into some Steem development in the near future!

Totally accidentally posted this before it was done the first time, oops

Sort:  

Welcome

welcome to the #steemit community. i'm looking forward to following your journey through the blockchain hopin' you make some #crypto gain :).

Ha thanks! Glad to be here, looking forward to adding more content as I continue to learn