You are viewing a single comment's thread from:

RE: An open-ended question to @ned and @dan

in #steemit7 years ago

It's pretty simple to understand really. Blockchains are designed first and foremost to be a tamper-proof, permanent record. Databases need different structure than a log to be quick to search. To confirm a transaction, there has to also be a search of the history, to ensure provenance and ownership is as claimed in a new transaction. It is not complex to implement a relatively fast search by having an index tied to the block log, but it's not an optimal search strategy.

Then, add a forum and votes to this structure and you can see pretty easily how horrendously resource consuming and processing consuming it becomes as it gets past a certain size.

Many forum applications actually use the underlying filesystem to store the files. Most filesystems have 1, 2 or 4kb filesystem blocks, which is a good size for blog posts. Usually 4kb these days. It makes a lot more sense to store posts as individual files, or, at least, using a sparse 'shared memory' type file, create 4k blocks that store one post each. Most posts will fit in one block. The actual underlying varies from 512bytes to 8kbyte blocks, so, depending on this, one can make it very efficient for retreival. A usual strategy would be to partition the storage area into several different size blocks. you could have 512 byte, 1kb 2kb and 4kb regions in your shared memory file. These then require an allocation table with the hashes and the addresses of each block/page, and retreiving them is very very fast.

But a Key/Value or Graph database format does not understand this. It should be coded so the graph only keeps merkle roots of posts, that each post is coded into a string of indexes referring to the initial and subsequent edits, with the hash, and then you do two index searches and you have the data. But I'm pretty sure that dumb people just see a text format like json and don't even consider hardware limitations.

More than likely, this is exactly what the problem is. If there was merkle trees for each post instead of the whole text in the records, it would stay fast until it gets enormous. But if all those posts are strung together willy nilly alongside other, fixed-sized data blocks, it's gonna have a pretty serious problem with indexing and exploding addressing requirements. The indexes are probably designed for graph data blocks which probably are 128/256/512byte page tables. This is gonna clog up the allocation table really fast putting 1-4kb blobs in there, you see what I mean?

Sort:  

I'm not native english speaker nor a professional in any way at computer science so I don't understand all the details and terms you're trying to explain and use. This is something I'll try to fix with further education and I can always come back to this answer to fully understand it at some point in time.

I did however understand(or did I?) that it's not wise to store every post as plain text data and it rather should be coded into a string that could be searched,processed and stored efficiently.

Thank you for the detailed answer!

If you have been around for a while you may remember back in the day that there was a time you could only put 140 characters in an SMS. Then someone devised a method to string them together (it's a bit like a blockchain, funny enough). Basically, in the database, this is how graphene is probably storing the posts, as strings of 256 bytes, normally large enough for any graph data.

This is not a sustainable method of storing large blocks of data, not least of all because it's going to walk all over the block size boundaries.

You may have bumped into a problem of a similar nature with the FAT32 filesystem, and a large video file over 4Gb in size. The filesystem cannot do this because it hasn't got enough space to store the map of where all the pieces go.

Interesting. So majority of people isn't informed enough to see this problem at all and those who know aren't voicing it.

Yeah. The 'blockchain' business is so much hype and so little substance.

It's starting to feel Steem was just hastily put together using tools available and without that much thinking for the long term (+5 years and beyond).

Doesn't really make me want to put money to EOS at all.

How familiar are you with Ethereum? I'd like to hear your view on it.

Hehe... Pickin the ol' noggin...

Ok, ethereum uses merkle trees extensively, what this means is a bit like the graph database issue - it's only intended to store small blocks of information, and then it has secondary stores to store other things. This is the essential underlying principle of Ethereum's data storage strategy.

In these larger blob stores there can be virtual machine binary code for executing smart contracts, or, any kind of data at all, basically.

The issue that I see happening in the long term with Ethereum has to do with the blockchain getting stupidly long and there is no pruning method for reducing this problem. I think part of how it will extend-and-pretend for a while is by shifting sidechains into that storage area that is managed by the smart contracts. But this is going to have issues eventually with the cost of processing all this data via a virtual machine, and the inherent limitations compared to native code. I'm sure programmers are already devising idiot schemes how to clog ethereum's transient data stores with so much crap and putting so much load on processing that the Gas currency becomes so expensive that nobody can use it anymore.

Well, I think that's already a looming problem, and I believe you will find somewhere even that Dan Larimer has even pointed this out. I'm pretty sure he's doing the hammer-and-nails thing with his zero transaction fees idea, but probably missing some important point that is gonna bite EOS in the ass... once he's run off to his next thing. Or the bahamas.

I remember reading about pruning and ethereum in a same sentence before, they must be working on it.

Only time will tell, but there's really no returning back for Dan if he leaves EOS shortly after, but at that point he can buy an island or two for himself so it'll be fine.

And yes, Dan always mentions how he can spit out these different projects out fast and then he leaves the rest to fix them or to keep together. Bitshares seems to be holding fine though.

So do you think there's any blockchain that's doing "it" right now or have we yet to see such a project? Will any of these projects be running in 10 years.

Nope, I don't think any is doing it yet. The algorand guy is pushing towards it though, with his new project. SporeDB is along similar lines as what Algorand is doing, but not nearly so well funded. Well, that may change in the future, if my 'monetisation of code' concept is as killer as it sounds.

What we want is a protocol like TCP/IP or HTTP, for distributed systems. This requires the field to be fairly well codified, and that will then require a reference implementation.

What you said in the last paragraph is where I think Dan is going with his "Blockchain OS" concept. You raise some great points and are obviously no spring chicken to coding and systems architecture.

Having a solid foundation upon which to build higher layers is an important architectural principle. As things change in the upper layers they impose new requirements on the lower layers that test how flexible and comprehensive the foundation layers were designed. Sometimes a redesign of the foundational layer is more cost effective or quicker to implement than trying to retrofit changes (refactoring) to meet new requirements.

I'm reminded of the way the systems of the Voyager space probe were designed, both hardware and software, and b/c that design was so good the probe had a much longer lifespan that was ever anticipated. This depth of engineering and innovation is what we need for systems whose aim is to make mainstream institutions obsolete. We need long term thinking motivated by the value to humanity, not limited by greed or personal gain. An attitude of altruism but marketed in a way that provides support.

I want to read more of your ideas to motivate coders via 'monetisation of code', but on the surface it doesn't address this altruistic / innovation aspect for the long haul. It does address personal rewards for the coders, but it's divorced from the goal and purpose of the coding effort. It's not easy to anticipate future needs and build flexible systems than are adaptable to those needs in a cost effective way. How that type of innovation is incentivized and rewarded is the distinction I'm getting at.

This is the kind of depth of thinking that we want!

The monetisation of code is only one side of the equation. I think I made it clear that the reward calculation for code has to be STRONGLY modulated by reputation, which is peer review. The more, and more weight of reputation, as well as tied up stake, that votes on your code, the more you get, the more you rise. This may sound at first like a recipe for a circle jerk, but it doesn't take that many, opposed and competing peers to smack up such stake-powered nonsense. To earn on the code, you have to raise your rep, to have influence on the code, you have to raise your rep. It is primary.

Some idiot comes along with zero coding rep, and they have only 0.000001% reputation, even if they have a million tokens, they still can only up you by one, and because they have zero rep, they don't raise your rep that much either. The two formulas have to be balanced, and it will take some simulations to play out how it will factor and grow.

I don't think anyone has really taken that seriously the idea of really building an effective reputation system. I think there could be attempts out there, but probably their domain is too generalised to be useful. Having a constraint of 'generally recognised skill at software development' narrows it down a lot and gives a lot more possibility for peer regulation, polycentric, to be effective. I mean, we have already seen with Steem that to a large degree, the reputation system does help stop spammy, scammy, trololololing. The interface does not allow it to be separated from rewards calculations. I am not sure if this is good, or bad. Prima facie, it seems like a good idea to bind them together, but then you have questions about ratios.

Probably curves matter too - as in, similar to the old scheme here, but perhaps differently summed, related to the volume under the curve, as well as the total volume, over time, and other things.

I am of course thinking through it all as I go, but I have tasks to execute in sequence, and right now, I'm building a simple fork, and proposing a specific set of edits, and the chain basically being renewed, so there is an opportunity to fix the database system that is now threatening to destroy the platform. But I think as important as this, is that the premined stake, and the people holding it, not only violates common sense and justice, it also is technically illegal in at least US federal jurisdiction, that the maturity and personalities involved further compound the issue.