@aggroed interviews Steemit's hightech wizard @gtg aka @gandalf about the future and stability of the Steemit Blockchain.

in #steemit7 years ago (edited)

It's a pleasure to share a special guest with you all today. I have the great honor of having @gtg (also known as @gandalf) speak a few words with us about the Steem blockchain. @gtg can often be found as the number 1 Witness on Steem and is well known for his technical skills as an IT wizard, system administrator, and developer. I've been following a story for a little bit about how the Steem Blockchain application is working, and I want to address some concerns about it's ultimate fate.

While the website is called Steemit the blockchain program is called steemd. When you blog on steemit.com, the website is actually pulling data from and pushing data to the blockchain through steemd. Steemit.com is basically a fairly convenient looking glass for what's written on the actual blocks, which is Steem. The program that powers steemit.com to look at the material on the blockchain is called condenser. All the witnesses are running servers that store the information written into the Blockchain and it's our witness servers that build what you write and do as new blocks on the blockchain.

A few weeks back several of the larger RPC nodes, which are used by developers to run programs off of Steemd (not just for reading/writing blogs) crashed on the same day, and the witnesses and community were alarmed, annoyed at the loss of various bots, programs, and tools, and lastly concerned about the future of the Blockchain. While the programs didn't cease to exist, many were temporarily disabled as they may not have been configured to go to backup nodes or re-establish connections after a loss, or the nodes they chose to re-establish connections to were also down. Anyway, a lot of community members voiced concerns on the block.

In part, the severity of the concern at the time was caused by the account l0k1 who was writing about the immanent death of Steemd due to increasing RAM storage requirements that will eventually cause the program to shutdown. (l0ki also goes by the handles elfspice and calibrae, they were joined by the additional account faddat). It's worth noting they were simultaneously promoting their own blockchain application at the time. @gtg responded then and has been working hard on the backend to minimize some requirements and also calm fears. This article is designed to offer more clarity on the strength of the Steem blockchain, ecosystem, and the witnesses and developers behind it.


@gtg, can you state what happened in your opinion on the day that many of the full RPC nodes shut down?

It's not what happened in my opinion. It's just what happened :-)

One of my nodes was affected too. I remember that moment exactly. I was at the Warsaw Steem Meetup. Mitchell was giving his great speech, while I was looking at incoming alerts from my node.

I've collected useful data from that crash and sent them to Steemit dev team. Just after 69 minutes I received the first feedback and 25 minutes after that there was a complete, detailed description of the cause along with solution proposal.

The outage was caused by a bug in the tags plugin. Full RPC nodes run a number of non-consensus plugins to support various frontend features for condenser (Steemit.com is powered by condenser) and other apps.

These plugins are not as robustly tested as the consensus logic. The blockchain itself and other types of nodes like seed nodes, witness nodes, exchange nodes, etc. were unaffected.

@gtg, do you think that those nodes are likely to keep running into the same problem?

No. It is fixed now.

Of course like with every software new issues might appear in the future but with such an awesome community and filled with skilled developers I'm sure they would be quickly addressed and fixed. That's how it works now (I've experienced it myself), but in a long term it would be even better.

If you are paying attention on what changes are being made to the platform infrastructure you might notice that there are upcomming new features that will allow migration of the plugins functionality to services external to steemd. For steem powered service providers it would be a great improvement. It would not only let you use specialized data backends to deal with whatever functionality you need (graphs? sql? full text searches?) but also, in future if there were a similar bug, then only a specified service and a subset of frontend functionality would be affected until it's fixed and there would be no need to touch steemd. This is a great upcoming improvement overall.

@gtg, for many witnesses a full node used to take under 100g of RAM on a server, and now without setting up unique configurations take over 100G of RAM. You've recently stated that you can run a full RPC node under 64G of RAM. What's the financial impact of being able to run on a smaller system, and how are you able to do it?

For full node it is crucial to provide very fast access to the shared memory file. Low latency is the key. There's no need to keep whole file in RAM, but if portions of it are not there, they need to be quickly accessed. Using tmpfs is a very effective way to achieve that goal.

When I get enough time I will write a post about my node setup.
I used to (until recently) run my public full node on a machine with 32GB RAM, and I moved out not because of a slow performance or lack of RAM but because of something more trivial: running out of disk space (no option to add it for reasonable price to that config).

Please note, that hardware costs to run a node is just a very small factor of a total cost of running a public service. This is not a Steem specific, it's how things work in a real system administration and programming world.

@gtg, How do you see technology evolving here? It seems you're using a combination of storage devices in conjunction with RAM, do you think this is going to be a long term stable approach to scaling Steemit to millions of users in addition to the hundreds of thousands of accounts we have now?

With current approach we can scale 100x easily and there are ongoing optimizations that would help going beyond that. As for hardware resources, for example i3.2xlarge on Amazon EC2 is not only a good way to go as for now but has a lot potential to handle future growth.

Of course there are many challenges that need to be addressed, and fortunately they can be addressed ahead of time.

@gtg, last question, @followbtcnews (and @crimsonclad) just started the steemd.minnowsupportproject.org RPC node. Any tricks you can share with us to reduce it's requirements? How can the other witnesses learn and repeat the success that you're having?

All depends on your needs. My goal is to provide a service for general use, with all reasonable features. As I already mentioned using tmpfs for shared memory file is an effective method for keeping low latency access without need to have huge amounts of RAM.

For that to work you need a fast storage for swap space and other assets like the block_log.

Replaying the Steem blockchain is a really demanding process when it comes to I/O, but once the reindex is done, load on storage would be significantly lower.

For production grade infrastructure environment, you might consider preparing snapshot of a consistent blockchain state (shared memory file and block_log with index) on a separate instance and deliver it to your public instance for faster turnaround.



"Steem Wizard" image by @inber

The first image is also by @inber entitled "Gandalf making Steem Wizardry"


Liquid rewards from this post will be split between @followbtcnews for the steemd.minnowsupport.org RPC node cost and @gtg for his RPC node.

Sort:  

For what it is worth I continue to run a full node (for private use supporting some services I run) on 16 GB of RAM. The performance is okay except for replay/resync, which is annoying but not a show stopper (still hours not days). The system has moderately fast local (not NAS) SSD. As @gtg indicated, the storage does increase at a steady rate and the hardware will need to be upgraded soon (as he did I will probably replace it rather than upgrade, though I could still upgrade if I wanted to).

I'm not saying I recommend this especially for a public node with potentially higher usage level, but it does work.

Yeah, so for all the haters you're gonna have to splurge on the 16GB of ram... Clearly we're about to die.

To be fair it is inconvenient and getting worse if you do have to resync/replay. But no, not a good doomsday story.

Thanks to our special guests @gtg @furion @smooth @followbtcnews for their presence and blessings :)

Excellent post and somewhat a good timing to experience this down time. In result, gtg was able to share us informative solution going forward.

I'm also studying the technical requirements to provide Full RPC Node.
Care to share a good technical howto guide to setup RPC node.

More power to you gtg. I voted you as my Witness. Sharing your knowledge to a wanna be Witness is priceless.

Awesome, very informative thanks a ton!

Yes, Hardware should be upgrade and get replaced if required after an interval.

All plugins activated with 16gb RAM?

May be missing a few of the smaller ones but most are enabled including definitely: a) full node not LOW_MEMORY and b) full account_history with no filter.

Exactly.... Thought of that too... Nice one

@smooth, would you consider allowing me to vote with some of your SP? I noticed that you have not been actively voting lately, and I could use some extra firepower. I have a few projects I would like to undertake on the platform, and would put it to good use. I'm not asking for a handout, just some delegated SP. Thanks for considering.

Most of my SP is delegated out through minnowbooster or a few private deals. I suggest the former as a general resource for obtaining SP to use.

Good of you to come out and comment on this one, I should have looked in here before my recent memo.

Personally I feel the rates at MB and BT are pretty excessive - especially if you have a open plan of action.

Simply.

100,000 SP - you get all my curation rewards - I think I can beat what you would get with MB for that (looks to be $140 a week?).

2 week trial?

@aggroed and @gtg you were the real star of the show. keep the momentum on :)

This post has been resteemed by @minnowsupport courtesy of @aggroed from the Minnow Support Project ( @minnowsupport ). Join us in Discord.

Upvoting this comment will help support @minnowsupport.

Thanks so much for this fully educative and informative piece. With more upgrades i believe the steem is going greater hieghts.

scaling x100 with this approach.. I think we're all good for the next ..... year or 2 ? great article to allay the fears of imminent steemdeath "

I hope that we get 100x sooner. And by that time we would be ready for more :-)

This is really awesome news. Thank you for opening up on those more technical aspect of Steem. I look forward to learn more about RPC nodes.

Yes I too would like to say something that sounds intelligent ! Well l0ki at least created awareness of otherwise seemingly boring details that most people don't worry about, it's like he got inner city kids to care about algebra! heh but yeah whatever so what if lok1/elfspice caused a lil panic lol yeah they were just trying to promote their own calibrae thing and if steem is doomed to fail calibrae definitly wont work! and yeah whats not good is that they make steem look bad to new users loose confidence in it etc, but steem is like bitcoin, whatever doesnt kill it just makes it stronger!

I think x100 will come pretty quick ! let's hope that steemit continues to be an agile platform. It's the only way to survive in such a rapidly changing climate

Thank you for sharing this interview and supporting the minnow community!

Nice work.. Love it

In part, the severity of the concern at the time was caused by the account l0k1 who was writing about the immanent death of Steemd due to increasing RAM storage requirements that will eventually cause the program to shutdown.

for many witnesses a full node used to take under 100g of RAM on a server, and now without setting up unique configurations take over 100G of RAM.

You guys are talking about 100GB of RAM? That's a lot...

If you are paying attention on what changes are being made to the platform infrastructure you might notice that there are upcomming new features that will allow migration of the plugins functionality to services external to steemd. For steem powered service providers it would be a great improvement. It would not only let you use specialized data backends to deal with whatever functionality you need (graphs? sql? full text searches?) but also, in future if there were a similar bug, then only a specified service and a subset of frontend functionality would be affected until it's fixed and there would be no need to touch steemd. This is a great upcoming improvement overall.

Plugins like what? Even side-chains?

Thank you for that interview. I remember how elfspice was shouting around how fragile Steem is. I am glad to read another perspective on that.


How difficult is it a for a beginner to run a witness server? @jerrybanfield does it and he is not a developer at all, is he?

Witnesses are here to serve the platform. If one have not idea what is he / she doing, then how they can serve the platform? It's like being a Taxi Driver. If one can afford to buy/rent a car, can start the engine, turn left/right then is ready to go, right?

"Dogma" (1999)

Well, I am not suggesting that everyone throws a couple of servers in the round just to make a few quick Steem in the process. I am talking on educating oneself about the matter in the most adequate way possible. The question is how long will it take for a person learning all day everyday? A week? A month? A year? A lifetime?

Depends on skills I suppose.
I still have A LOT to learn.

This post has been resteemed by @mrsquiggle courtesy of @scooter77 from the Minnow Support Project ( @minnowsupport ). Join us in Discord.

Upvoting this comment will help support @minnowsupport.