You are viewing a single comment's thread from:

RE: An open-ended question to @ned and @dan

in #steemit7 years ago (edited)

As a former witness operator, I have had intimate dealings with the software that runs the Steem blockchain. Since I first started working with it, back in November, and gradually getting worse until now, the rate at which the blockchain replay takes place (necessary to get an RPC server running, which is necessary to run Steem based applications), has slowed down at an alarming rate.

Note that facebook uses massive clusters of a specialised database that has the ability to break the database apart, and runs queries entirely in memory, in giant data centers. Steem's success necessarily will cause this database to grow beyond the capabilities of graphene, and even, eventually into the size of a database like Reddit or Facebook (at least, circa 2-3 years ago).

Because I was curious to understand why, it turns out that the database system, Graphene, a graph database, is very unsuited to storing the data of a forum like this. The memory utilisation becomes enormous and at a certain point, the process of creating search indexes out of the transaction log (blockchain) basically goes on for eternity.

It already goes very slow. 6 weeks ago I was trying to get an RPC up and running so I could do some development of Steem based applications. 2 weeks later, I still was unable to get one running. Yes, I stopped the process a few times through that, but I should not still be waiting for an RPC replay to finish 3 days after I start it. You can imagine how inconvenient that is with a computer basically doing nothing productive for this long, and no certainty it will ever finish doing anything. I certainly was not going to be able to do any kind of programming that required access to the database.

If something is not done soon, everything will go offline. Steemit.com, all the exchanges, and all the steem apps.

People think I am just being a chicken little about this, but just look at the situation. Poloniex trading in steem is down now for over a month. Changelly ditched steem some time ago. Only blocktrades, one of the founders, and bittrex are still functioning. The frequency of outages at steemd.com and steemdb.com is becoming alarming.

If those remaining two exchanges go offline at the same time, you can't sell your Steem.

Sort:  

I wish my upvote was worth more, because I am very interested in your response. Since I'm not that technical that meant something to me, but I'm still a bit unclear.

How about this. Let me ask you: What would you do, if you had the ability in order to overcome this obstacle?

just guessing I would say replace Graphene with something less resource demanding. But is that like asking a mountain to move?

It isn't that difficult, really. I mean, there is implementations of ethereum full nodes in like 10 different languages. There is just a need to reimplement.

The reimplementation, at minimum, needs to split the user registry, the token ledger and the forum/voting system, and the underlying witness/cluster membership management system need to run separate database instances, using a more suitable database.

Note that this will only allow Steem to maybe cope with up to about 1 million active users (currently around 15,000). To cope with a rapid influx of new users beyond this level, requires some massive database engineering feats the likes of which to date, Google and Facebook are the only organisations that have pulled this off.

As @andybets was telling me, he has most of the chain's posts and comments extracted into a flat file database, which is still better than it being embedded within a graph database.

There is the resources and knowledge in the community to make the necessary changes, and I expect these changes will be forthcoming over the next few months.

I am just hoping though, that someone will do something that I think should be done. Erase the premined SP, and release the vests to the rewards pool. The remainder of accounts are invested or earned stake. The profile of trending posts will entirely change, it certainly won't have Dan Larimer's opining at top trending because he can vote so much on himself that even half the other whales can't flag it down, if they work together...

Thank you for your honest opinion, it means a lot to us.

Very interesting.. this must be why @pfunk hates my botnet voting so much.. as he runs a witness node. Thanks for all the information guys!

Doesn't hurt me at all. My node doesn't need near as much memory as a service that needs to track account balances like, you know... an exchange.

If you want to see Steem stick around and even potentially make money with it, what you're doing is so short sighted it's like a dog eating its own leg because it's hungry.

Errrrr....Bittrex unavailable for 48 hours now...Automated Maintenance...

The period of time required for maintenance keeps getting longer and longer. This is the problem I have been talking about since November last year, that I am saying, is now becoming terminal. This is why poloniex has quit working with it, and now all you see is 'waiting on the steem team to fix the problem'. @ned just to remind you, fearless leader, that this network is losing exchangers because of the impossible to operate server software.

I have been here for more than a year, posting, commenting and curating, encouraging newbies, organised Steemit meetups....but finally decided to power down and call it a day. I only hope thete will be somrthing to trade in the next 13 weeks...I have put in a lot of work for that SP. Would like to get something out

The pressure has to be raised against Steemit. They are either going to fix it, or they are going to lose their userbase and position in the market cap ladder, very quickly. By userbase I also include developers and node operators, who are being shafted the hardest.

They not hurting, so no big incentive.

I doubt that is true. If it's not gonna be the only game in town soon, doubly so.

it looks similar but being hitched to the ethereum horse is a bad idea imo. ethereum is going to hit scalability problems in the future and this issue is central in guiding my design.

I also am completely against virtual machines. Interpretive languages, well, for now thanks to browsers we are stuck with that but I hope in the future fully native app frameworks will emerge. This will be a later element.

Also, to be clear, steem type rewards for development contributions are the central defining feature of DIV. Nobody has had this idea. The forum is a necessary part because a dev platform needs issue trackers, and from there you go to wiki, and media hosting/monetisation...

This is very valuable information @loki/@elfspice. I've been following you from way back and have always valued your honest opinions even though they don't praise Steem always( neither do mine ). We have to be critical and not just bury our heads into the sand. There are only few people with as long history with steem as you, and even fewer who really know the technology and can critique it. I know I can't do that.

So a big thank you for making this comment! I've also noticed the increased mainteance rate in Steemd and have wondered what the hell is the issue with exchanges and Steem. I'd love to hear more about your opinions and findings on this.

You might not at times feel welcome here, but nevertheless, you're an valuable member that has the best interest of whole rather than few in mind. Sadly that isn't rewarded here. That's at least mine perception of you, hopefully I'm not wrong!

It's pretty simple to understand really. Blockchains are designed first and foremost to be a tamper-proof, permanent record. Databases need different structure than a log to be quick to search. To confirm a transaction, there has to also be a search of the history, to ensure provenance and ownership is as claimed in a new transaction. It is not complex to implement a relatively fast search by having an index tied to the block log, but it's not an optimal search strategy.

Then, add a forum and votes to this structure and you can see pretty easily how horrendously resource consuming and processing consuming it becomes as it gets past a certain size.

Many forum applications actually use the underlying filesystem to store the files. Most filesystems have 1, 2 or 4kb filesystem blocks, which is a good size for blog posts. Usually 4kb these days. It makes a lot more sense to store posts as individual files, or, at least, using a sparse 'shared memory' type file, create 4k blocks that store one post each. Most posts will fit in one block. The actual underlying varies from 512bytes to 8kbyte blocks, so, depending on this, one can make it very efficient for retreival. A usual strategy would be to partition the storage area into several different size blocks. you could have 512 byte, 1kb 2kb and 4kb regions in your shared memory file. These then require an allocation table with the hashes and the addresses of each block/page, and retreiving them is very very fast.

But a Key/Value or Graph database format does not understand this. It should be coded so the graph only keeps merkle roots of posts, that each post is coded into a string of indexes referring to the initial and subsequent edits, with the hash, and then you do two index searches and you have the data. But I'm pretty sure that dumb people just see a text format like json and don't even consider hardware limitations.

More than likely, this is exactly what the problem is. If there was merkle trees for each post instead of the whole text in the records, it would stay fast until it gets enormous. But if all those posts are strung together willy nilly alongside other, fixed-sized data blocks, it's gonna have a pretty serious problem with indexing and exploding addressing requirements. The indexes are probably designed for graph data blocks which probably are 128/256/512byte page tables. This is gonna clog up the allocation table really fast putting 1-4kb blobs in there, you see what I mean?

I'm not native english speaker nor a professional in any way at computer science so I don't understand all the details and terms you're trying to explain and use. This is something I'll try to fix with further education and I can always come back to this answer to fully understand it at some point in time.

I did however understand(or did I?) that it's not wise to store every post as plain text data and it rather should be coded into a string that could be searched,processed and stored efficiently.

Thank you for the detailed answer!

If you have been around for a while you may remember back in the day that there was a time you could only put 140 characters in an SMS. Then someone devised a method to string them together (it's a bit like a blockchain, funny enough). Basically, in the database, this is how graphene is probably storing the posts, as strings of 256 bytes, normally large enough for any graph data.

This is not a sustainable method of storing large blocks of data, not least of all because it's going to walk all over the block size boundaries.

You may have bumped into a problem of a similar nature with the FAT32 filesystem, and a large video file over 4Gb in size. The filesystem cannot do this because it hasn't got enough space to store the map of where all the pieces go.

Interesting. So majority of people isn't informed enough to see this problem at all and those who know aren't voicing it.

Yeah. The 'blockchain' business is so much hype and so little substance.

It's starting to feel Steem was just hastily put together using tools available and without that much thinking for the long term (+5 years and beyond).

Doesn't really make me want to put money to EOS at all.

How familiar are you with Ethereum? I'd like to hear your view on it.

Hehe... Pickin the ol' noggin...

Ok, ethereum uses merkle trees extensively, what this means is a bit like the graph database issue - it's only intended to store small blocks of information, and then it has secondary stores to store other things. This is the essential underlying principle of Ethereum's data storage strategy.

In these larger blob stores there can be virtual machine binary code for executing smart contracts, or, any kind of data at all, basically.

The issue that I see happening in the long term with Ethereum has to do with the blockchain getting stupidly long and there is no pruning method for reducing this problem. I think part of how it will extend-and-pretend for a while is by shifting sidechains into that storage area that is managed by the smart contracts. But this is going to have issues eventually with the cost of processing all this data via a virtual machine, and the inherent limitations compared to native code. I'm sure programmers are already devising idiot schemes how to clog ethereum's transient data stores with so much crap and putting so much load on processing that the Gas currency becomes so expensive that nobody can use it anymore.

Well, I think that's already a looming problem, and I believe you will find somewhere even that Dan Larimer has even pointed this out. I'm pretty sure he's doing the hammer-and-nails thing with his zero transaction fees idea, but probably missing some important point that is gonna bite EOS in the ass... once he's run off to his next thing. Or the bahamas.

I remember reading about pruning and ethereum in a same sentence before, they must be working on it.

Only time will tell, but there's really no returning back for Dan if he leaves EOS shortly after, but at that point he can buy an island or two for himself so it'll be fine.

And yes, Dan always mentions how he can spit out these different projects out fast and then he leaves the rest to fix them or to keep together. Bitshares seems to be holding fine though.

So do you think there's any blockchain that's doing "it" right now or have we yet to see such a project? Will any of these projects be running in 10 years.

Nope, I don't think any is doing it yet. The algorand guy is pushing towards it though, with his new project. SporeDB is along similar lines as what Algorand is doing, but not nearly so well funded. Well, that may change in the future, if my 'monetisation of code' concept is as killer as it sounds.

What we want is a protocol like TCP/IP or HTTP, for distributed systems. This requires the field to be fairly well codified, and that will then require a reference implementation.

What you said in the last paragraph is where I think Dan is going with his "Blockchain OS" concept. You raise some great points and are obviously no spring chicken to coding and systems architecture.

Having a solid foundation upon which to build higher layers is an important architectural principle. As things change in the upper layers they impose new requirements on the lower layers that test how flexible and comprehensive the foundation layers were designed. Sometimes a redesign of the foundational layer is more cost effective or quicker to implement than trying to retrofit changes (refactoring) to meet new requirements.

I'm reminded of the way the systems of the Voyager space probe were designed, both hardware and software, and b/c that design was so good the probe had a much longer lifespan that was ever anticipated. This depth of engineering and innovation is what we need for systems whose aim is to make mainstream institutions obsolete. We need long term thinking motivated by the value to humanity, not limited by greed or personal gain. An attitude of altruism but marketed in a way that provides support.

I want to read more of your ideas to motivate coders via 'monetisation of code', but on the surface it doesn't address this altruistic / innovation aspect for the long haul. It does address personal rewards for the coders, but it's divorced from the goal and purpose of the coding effort. It's not easy to anticipate future needs and build flexible systems than are adaptable to those needs in a cost effective way. How that type of innovation is incentivized and rewarded is the distinction I'm getting at.

This is the kind of depth of thinking that we want!

The monetisation of code is only one side of the equation. I think I made it clear that the reward calculation for code has to be STRONGLY modulated by reputation, which is peer review. The more, and more weight of reputation, as well as tied up stake, that votes on your code, the more you get, the more you rise. This may sound at first like a recipe for a circle jerk, but it doesn't take that many, opposed and competing peers to smack up such stake-powered nonsense. To earn on the code, you have to raise your rep, to have influence on the code, you have to raise your rep. It is primary.

Some idiot comes along with zero coding rep, and they have only 0.000001% reputation, even if they have a million tokens, they still can only up you by one, and because they have zero rep, they don't raise your rep that much either. The two formulas have to be balanced, and it will take some simulations to play out how it will factor and grow.

I don't think anyone has really taken that seriously the idea of really building an effective reputation system. I think there could be attempts out there, but probably their domain is too generalised to be useful. Having a constraint of 'generally recognised skill at software development' narrows it down a lot and gives a lot more possibility for peer regulation, polycentric, to be effective. I mean, we have already seen with Steem that to a large degree, the reputation system does help stop spammy, scammy, trololololing. The interface does not allow it to be separated from rewards calculations. I am not sure if this is good, or bad. Prima facie, it seems like a good idea to bind them together, but then you have questions about ratios.

Probably curves matter too - as in, similar to the old scheme here, but perhaps differently summed, related to the volume under the curve, as well as the total volume, over time, and other things.

I am of course thinking through it all as I go, but I have tasks to execute in sequence, and right now, I'm building a simple fork, and proposing a specific set of edits, and the chain basically being renewed, so there is an opportunity to fix the database system that is now threatening to destroy the platform. But I think as important as this, is that the premined stake, and the people holding it, not only violates common sense and justice, it also is technically illegal in at least US federal jurisdiction, that the maturity and personalities involved further compound the issue.