BlockTrades RocksDB proposal to Steemit

in #blocktrades6 years ago

Steem nodes are costly to operate and the cost is increasing

The Steem blockchain currently requires significant resources to run a Steem node for acting as a Steem witness or for operating a cryptocurrency exchange. For example, if you want to run a witness that runs on an average cloud server, you can expect to need 47GB currently, and this is likely to increase beyond 64GB before too long. This latter value is important because server costs increase dramatically as a server exceeds certain memory requirements (64GB, 128GB, 256GB, 512GB). These memory costs also impose limits on the ability for the Steem network to grow in terms of users, posts, comments, etc, so it’s important to Steem’s future to reduce the costs of running a Steem node.

Replacing chainbase storage with rocksdb storage to slash node operating costs

BlockTrades believes the most suitable solution to this problem is moving much of the blockchain state data that is currently stored in a database called “chainbase” into a more efficiently organized form using the RocksDB database. Chainbase was an early attempt to reduce the memory requirements of running a Steem node, but it ultimately failed because the storage mechanism was poorly tuned for storing data on a hard disk, which lead to most nodes storing their chainbase in a form of “RAM disk” in order to operate at acceptable performance levels. In other words, node operators found that the data still needed to be stored in memory in order for the node to operate acceptably.

Unlike chainbase, RocksDB is specifically designed for efficient storage and retrieval of data using standard hard disk technology, so it will enable this state data to be moved out of memory once and for all, resulting in reduced costs to run Steem nodes.

Why BlockTrades is the natural candidate to implement rocksdb in Steem

The BlockTrades development team has long been concerned about the increasing costs of running Steem nodes and we approached Steemit over a year ago with a design architecture for using Rocksdb to solve the problem, based on our prior experience with high performance database coding. Steemit was generally receptive to the idea and we implemented an initial proof-of-concept under contract with Steemit that stored database from the “account history plugin” that carried one of the highest memory costs for the API nodes operated by Steemit. This technology was successfully deployed in Steemit’s production servers and has allowed Steemit API servers to continue operating on high end cloud servers.

Note, however, that this was just a proof-of-concept to show that Rocksdb met the performance requirments to allow state data to be moved from memory to disk storage by writing special case code to move the account history data. It did not reduce the memory requirements of the “consensus” state data that requires witnesses to run nodes with a large amount of memory. But we proposed a second stage project whereby we would develop a generic way to move Steem state data (stored in C++ data structures called multi-indexes) on to disk storage using Rocksbd, then use this technology to move most if not all of the state data onto disk.

An important part of this latter proposal was to also develop an extensive test framework to verify that functional operation and performance of the Steem node was not degraded as we progressively move more and more of the state data onto disk storage. At the time, Steemit rejected that proposal as they decided that enough memory optimization had been done for their immediate needs and Steemit planned to move all the non-consensus data out of the nodes into a separate database (variously SBDS and Hivemind).

If hivemind is going to reduce memory costs, why is rocksdb needed?

Hivemind will move “non-consensus” state data out of the API nodes operated by Steemit that are used to serve data to steemit.com and other web sites and user interfaces like busy.org, steempeak, partico, appics, etc. But it won’t reduce the amount of memory required to operate the most important nodes required to keep Steem a decentralized network: witness nodes. Rocksdb, however, will allow this data to be moved to disk, allowing witness to operate nodes at reasonable cost now and in the future as the blockchain’s activity level increases.

Why re-propose this now?

Steemit has recently expressed a renewed interest in reducing the costs of operating Steem nodes. At the same time, they’ve reduced the number of employees able to complete various Steem-related projects. We at BlockTrades believe that we can provide the expertise to complete the Rocksdb at an efficient cost and free up Steemit’s blockchain programmers to focus on other high priority tasks like implementation of SMTs.

What are we proposing?

Our proposal is to analyze the key data structures currently consuming memory in consensus nodes (i.e. witness nodes) and then move some of this data into Rocksdb storage using a generic solution we will develop. We will then thoroughly test the resulting version of the node versus the operation of existing chainbase nodes to ensure that the replacement will be seamless. The generic solution we develop can then be used to move more state data out to disk as required in the future and the regression test system we develop will be re-usable as well. We believe we can perform this work in 2-3 months with a 6 person team for a fixed-cost of $250K.

If this is an offer to Steemit, isn’t this better done directly to Steemit?

We have approached Steemit with this offer and they are considering it, but Ned asked that I also share this information with the whole Steem community and see how the idea is received. I believe this could a great first step to expanding development of Steem’s blockchain code beyond Steem’s core team and will enable Steem to progress faster than ever before, so please let us know in the comments how you view this project proposal.

Sort:  

In the era of automated upvotes, my 100% vote for this idea might not be a clear message ;-) but I'm all for this.

Some time ago I was hoping to move to std::allocator, which combined with a deterministic snapshots to save/load state will help a lot in maintenance of steemd nodes. That was before RocksDB, which is now not only a well proven solution, but also opens our way for some more fancy features, and all that without re-inventing the wheel (fast persistent storage).

It will help not only to reduce costs of API nodes (they will eventually move into steemd + bunch of microservices) but more importantly - consensus nodes.

We really don't want the exchanges to abandon us just because our state (file) is fat and ugly.

2-3 months, however, sounds very risky (I'll be guessing twice as much), especially that it needs to be tested throughly.

You would all be very wise to listen to this inhabitant of your planet.

If you need some assistance finding new ways to motivate these people to embrace this proposal, I am willing to lend a hand.

Specifically this one...


I think if they work together with Steemit's team, they can speed things up.

Do you know why Steemit isn't accepting outside help?

"Ned asked that I also share this information with the whole Steem community and see how the idea is received." Everyone seems to think this is positive offer and yet he decided to decline it... Now he's been silent for 16 days after not doing a promised livestream and has been in full power down for quite some time. What's happening?

Nobody knows what's going on in his head 🤷‍♂️

Shit, sounds bad. If only we'd had Vitalik with us

Evergreen content reduces spam thus reduces ram. The exchanges would abandon Steem because of low trading volume, Ned paid a pretty penny to get listed on binance, did you know that? He paid money. Over 100k, and that pays binance's nodes. Also ned delegated them. It seems the only people trading are the whales trying to dump on each other. Why would I ever invest into a coin the owner of majority (ned) is powering down? Many investors think the way I do.

Sounds good to me. Lets go for it.

I can't for the life of me see a downside to this... It makes absolut perfect sense to me. Not only does @blocktrades already have a dev team that is very familiar with the STEEM ecosystem, but more importantly this frees up Steemit Inc to work on SMTs.

And let's face it, many projects on here were basically betting on that horse to move their initiatives forward. This is the case for appics, actifit, 1up and the list gets quite long.

I sincerely hope this is agreed upon, and that Steemit Inc commits to keeping their initially announced timeline or get as close as possible.

There has been many comments regarding the importance of STMs to restore confidence, which let's face it, without, it's seriously hard to build anything of substance here.

So to me this is not a yes... this is a HELL YES! - let's make this happen, let's get these costs under control, lets move away from depending of Steemit Inc to do absolutely everything.

When the costs become reasonable as to allow us to run a node ourselves (in helpie or with a partnership) - I fully intend to throw our hat in the ring. To me it's never being about the idea of "Oh, but th node doesn't make me money" - I don't care about that, never have... However, I don't want to become homeless while attempting to afford one.

I will believe SMT exists once code is open source and we are testing on the testnets. If anyone banked their life or company on words uttered by steemit inc they made a critical flaw. Ned said back in 2017 SMT would be here in jan 2018. Wait. it's not January?

Fact is: 250$k is a lot of money. But another fact is: time is money.

2-3 months delay on SMTs is what most stakeholders of Steem are worried about.

By outsourcing the work to blocktrades, Steemit Inc. could get back on track with SMTs and depending on the need, could even release basic SMTs ahead of the scheduled time of May 2019.

Now, with having my limited knowledge in mind of how much Steemit Inc. can afford at this point in time, I'm still very much in favour of this proposal. However, in the end, Steemit Inc. has to make the decision if they believe it's a good investment.

Fact is: 250$k is a lot of money. But another fact is: time is money.

the simplest yet most realistic way of looking at this... Priority #1 is to restore confidence and that can only be done by trying to meet the already established deadlines.

I agree with everything except the last part of your comment @therealwolf since I think @ned and @steemit will factor in the support or non-support of the Steem community for this agreement.

this frees up Steemit Inc to work on SMTs.

Haven't they learned that SMTs should NOT be the priority for the blockchain?

Very valid point my friend, however the carrot has been dangled long enough and there is something that I can only describe as "hope fatigue" - Investors, both those in here already and those from the outside looking in have lost a lot of confidence. Delivering on promises is all about restoring that, even if it brings an avalanche of shitcoins, which it very well might.

SMTs will add more pressure and complexity to the system. If the system isn't scalable and optimal, we'll end up with more problems. They should have focused on that before making bold promises. Simply put, we're not ready for SMTs.

Part of the issue is that community features are tied up with SMTs, meaning that both the economic AND social feature set of Steem are held back by now launching SMTs. It would be great if somehow we could at least have communities get a chance to take root, even if SMTs take a lot longer. Communities should be a significantly simpler objective to realise.

I think it is a huge assumption that SteemIt, Inc. Will go back to working on the SMTs. Just keep in mind that is coming from your sense of priorities and not Ned's.

I love the idea and the proposal. I also know what blocktrades is capable of and have full confidence in their ability to perform.

Question: could part of the cost be offset by the community using its stake to upvote brief daily progress reports?

Realistically, I'd probably need to write those reports personally, and giving my own billing rate (much higher than our other programming staff), daily posts would probably end in a net loss, unfortunately. It's also difficult to describe the daily work done by a programming team in a form that is meaningful to non-programmers. I recall this very well, as I had a similar request made when we were working on the BitShares p2p networking layer many years ago (same one used now by Steem code base).

True.

In any case I have seen comments from several in the community that would be willing to contribute to this much needed solution. With it we all win. Without it............

If necessary I'd prefer to use my stake in order to contribute using whatever method would be the most practical.

I am all for this.

This is really what is needed. Other people outside Steemit to step up and do what needs to be done. I like the idea of your team handling the database conversion to best handle the nodes while lets the Steemit team to continue to focus upon SMTs.

That is a win all around if you truly believe you can pull it off, which I presume you do or you never would have proposed it.

Great idea.

I have more confidence in your team’s development abilities than Ned’s. I would support this proposal.

How are you planning to cover the costs of development? Will this be paid by Blocktrades company funds or are you looking for payment from Steemit, Inc. or other blockchain-related options?

[EDIT] - Just read some of the other comments and found an answer. Will you still go ahead with development if Steemit, Inc. does not accept your proposal? You mentioned that this is a benefit for Steem and node operators, so could development still be possible if they reject the offer?

I'm willing to fund the effort in the short term as proposed, if Steemit will agree to some kind of reasonable performance contract, but I don't want to end up in a situation where our guys create code, their guys create competing code, then they just decide to go with their code cause "it's ours, it must be better" (and in any event, it results in a lot of lost programming effort).

If we deliver a product that provides real value to Steem and Steemit, I think it's only fair for Steemit to make a reasonable payment, as they benefit far more than we do. We're actually quoting at below our "normal" rates for fixed-cost blockchain work, as we do have some stake in the results and we hope this can lead to more opportunities in the future.

Some people have also suggested crowdfunding all or a portion of the work costs, but I think the expense and time overhead make that option less attractive in this particular case. That said, I do think it would make sense in the long run for Steem to have some kind of worker proposal system like BitShares has. It's been very successful, IMO.

Nice to finally see some action with my own eyes.. Yes yes yes make this happen!

Also, I don't know how not to get behind the idea of a worker proposal system like Bitshares. It has worked for them, it has worked for Dash, it will work for Steem, Power to the community.

Posted using Partiko Android

"I do think it would make sense in the long run for Steem to have some kind of worker proposal system like BitShares has. It's been very successful, IMO."

This!

good solution by ned scott

I fully suport the idea.
Steemit Inc can't do everything.
I would love to see the experts try to estimate time and effort to analyze if the 250k is a fair price for that work.
From my point of view if it is paid after the work is done it is worth to accept it.

Nice proposal @blocktrades
Really appreciated.

I have read all the comment and made me a picture.

BlockTrades has my voice to do it

This is a win win for Steemians and it creates room for investors