You are viewing a single comment's thread from:

RE: An open-ended question to @ned and @dan

in #steemit7 years ago

Can be a winner on a nuclear war? Only high intelligent members are allowed to answer. Thank you!

Sort:  

It's all moot if you ask me. Graphene is already creaking with 1 years worth of UTF-8 blobs. You would know very well, @liondani just how likely it is that within 6 months the whole database will come to a grinding halt, for good.

I'm betting that the answer to the question of wtf ned and dan are doing, is probably not that complicated. They know I am right.

Thanks for that detailed information.

So the lags i see from time to time and some services not working on steemdb or other ones are related to what you said?

Yup. The architecture is not capable of dealing with this much data at the advertised speeds that @dan has been so specifically advertising about his 'fancy' graphene.

I hate to break it to @dan, but there is no way you can use C++ and Boost to make a fast database system. That would be C/Assembler and GLib or homegrown libraries.

Question:
I really think to make my old blog alive more and more, because here posts are lost if they are not viewed in 1-3 days.
Also my posts can't find them easily if i want to check them.
Can i gather information you posted here and quote them there in an article?

I would love to keep track of such good informations and that's the only good way i see. To other people can reach it also, and not be lost here.

Or better, if you can or have a full article with that, would love that more.
Waiting for your answer. Tks

C++ can be quite fast but of course Assembler and C are slightly faster. Enough to make a difference though? I'm not certain.

Code that runs fast is quite distinct from code that gets the best out of hardware. C++ is loathed by the most famous kernel coder because C++ tends to lead, especially inexperienced coders, to mistake abstractions for the physical. Abstractions are for convenience, not performance.

My main issue with C++ is it's excessively complex. My most recent study of CS was from a russian friend I met while living on the street in Amsterdam, and his obsession was functional proogramming. I am resolutely a functional programmer now, but I believe that some algorithms are better done procedurally, or using objects, but no matter what other paradigm you work with, in my opinion, if you disrespect the functional rule about messing with the data inside functions, from other functions, you make your code unmaintainable.

C++ tries to encourage programmers to make data private within objects, but it tends to lead to functions within the methods of an object, that are too broadly defined, and lead to 'dirty functions'. The object really should be broken into several distinct sub-objects but the programmer's model mingles them into one, and the bigger these 'objects' get, the more likely they are to become impossible to re-engineer once a model proves to be wrong, because 5 different methods interact with the same variables, creating entropy in the algorithm.

Entropy is death for distributed systems.

can you please explain this statement?

As a former witness operator, I have had intimate dealings with the software that runs the Steem blockchain. Since I first started working with it, back in November, and gradually getting worse until now, the rate at which the blockchain replay takes place (necessary to get an RPC server running, which is necessary to run Steem based applications), has slowed down at an alarming rate.

Note that facebook uses massive clusters of a specialised database that has the ability to break the database apart, and runs queries entirely in memory, in giant data centers. Steem's success necessarily will cause this database to grow beyond the capabilities of graphene, and even, eventually into the size of a database like Reddit or Facebook (at least, circa 2-3 years ago).

Because I was curious to understand why, it turns out that the database system, Graphene, a graph database, is very unsuited to storing the data of a forum like this. The memory utilisation becomes enormous and at a certain point, the process of creating search indexes out of the transaction log (blockchain) basically goes on for eternity.

It already goes very slow. 6 weeks ago I was trying to get an RPC up and running so I could do some development of Steem based applications. 2 weeks later, I still was unable to get one running. Yes, I stopped the process a few times through that, but I should not still be waiting for an RPC replay to finish 3 days after I start it. You can imagine how inconvenient that is with a computer basically doing nothing productive for this long, and no certainty it will ever finish doing anything. I certainly was not going to be able to do any kind of programming that required access to the database.

If something is not done soon, everything will go offline. Steemit.com, all the exchanges, and all the steem apps.

People think I am just being a chicken little about this, but just look at the situation. Poloniex trading in steem is down now for over a month. Changelly ditched steem some time ago. Only blocktrades, one of the founders, and bittrex are still functioning. The frequency of outages at steemd.com and steemdb.com is becoming alarming.

If those remaining two exchanges go offline at the same time, you can't sell your Steem.

I wish my upvote was worth more, because I am very interested in your response. Since I'm not that technical that meant something to me, but I'm still a bit unclear.

How about this. Let me ask you: What would you do, if you had the ability in order to overcome this obstacle?

just guessing I would say replace Graphene with something less resource demanding. But is that like asking a mountain to move?

It isn't that difficult, really. I mean, there is implementations of ethereum full nodes in like 10 different languages. There is just a need to reimplement.

The reimplementation, at minimum, needs to split the user registry, the token ledger and the forum/voting system, and the underlying witness/cluster membership management system need to run separate database instances, using a more suitable database.

Note that this will only allow Steem to maybe cope with up to about 1 million active users (currently around 15,000). To cope with a rapid influx of new users beyond this level, requires some massive database engineering feats the likes of which to date, Google and Facebook are the only organisations that have pulled this off.

As @andybets was telling me, he has most of the chain's posts and comments extracted into a flat file database, which is still better than it being embedded within a graph database.

There is the resources and knowledge in the community to make the necessary changes, and I expect these changes will be forthcoming over the next few months.

I am just hoping though, that someone will do something that I think should be done. Erase the premined SP, and release the vests to the rewards pool. The remainder of accounts are invested or earned stake. The profile of trending posts will entirely change, it certainly won't have Dan Larimer's opining at top trending because he can vote so much on himself that even half the other whales can't flag it down, if they work together...

Thank you for your honest opinion, it means a lot to us.

Very interesting.. this must be why @pfunk hates my botnet voting so much.. as he runs a witness node. Thanks for all the information guys!

Doesn't hurt me at all. My node doesn't need near as much memory as a service that needs to track account balances like, you know... an exchange.

If you want to see Steem stick around and even potentially make money with it, what you're doing is so short sighted it's like a dog eating its own leg because it's hungry.

Errrrr....Bittrex unavailable for 48 hours now...Automated Maintenance...

The period of time required for maintenance keeps getting longer and longer. This is the problem I have been talking about since November last year, that I am saying, is now becoming terminal. This is why poloniex has quit working with it, and now all you see is 'waiting on the steem team to fix the problem'. @ned just to remind you, fearless leader, that this network is losing exchangers because of the impossible to operate server software.

I have been here for more than a year, posting, commenting and curating, encouraging newbies, organised Steemit meetups....but finally decided to power down and call it a day. I only hope thete will be somrthing to trade in the next 13 weeks...I have put in a lot of work for that SP. Would like to get something out

The pressure has to be raised against Steemit. They are either going to fix it, or they are going to lose their userbase and position in the market cap ladder, very quickly. By userbase I also include developers and node operators, who are being shafted the hardest.

They not hurting, so no big incentive.

This is very valuable information @loki/@elfspice. I've been following you from way back and have always valued your honest opinions even though they don't praise Steem always( neither do mine ). We have to be critical and not just bury our heads into the sand. There are only few people with as long history with steem as you, and even fewer who really know the technology and can critique it. I know I can't do that.

So a big thank you for making this comment! I've also noticed the increased mainteance rate in Steemd and have wondered what the hell is the issue with exchanges and Steem. I'd love to hear more about your opinions and findings on this.

You might not at times feel welcome here, but nevertheless, you're an valuable member that has the best interest of whole rather than few in mind. Sadly that isn't rewarded here. That's at least mine perception of you, hopefully I'm not wrong!

It's pretty simple to understand really. Blockchains are designed first and foremost to be a tamper-proof, permanent record. Databases need different structure than a log to be quick to search. To confirm a transaction, there has to also be a search of the history, to ensure provenance and ownership is as claimed in a new transaction. It is not complex to implement a relatively fast search by having an index tied to the block log, but it's not an optimal search strategy.

Then, add a forum and votes to this structure and you can see pretty easily how horrendously resource consuming and processing consuming it becomes as it gets past a certain size.

Many forum applications actually use the underlying filesystem to store the files. Most filesystems have 1, 2 or 4kb filesystem blocks, which is a good size for blog posts. Usually 4kb these days. It makes a lot more sense to store posts as individual files, or, at least, using a sparse 'shared memory' type file, create 4k blocks that store one post each. Most posts will fit in one block. The actual underlying varies from 512bytes to 8kbyte blocks, so, depending on this, one can make it very efficient for retreival. A usual strategy would be to partition the storage area into several different size blocks. you could have 512 byte, 1kb 2kb and 4kb regions in your shared memory file. These then require an allocation table with the hashes and the addresses of each block/page, and retreiving them is very very fast.

But a Key/Value or Graph database format does not understand this. It should be coded so the graph only keeps merkle roots of posts, that each post is coded into a string of indexes referring to the initial and subsequent edits, with the hash, and then you do two index searches and you have the data. But I'm pretty sure that dumb people just see a text format like json and don't even consider hardware limitations.

More than likely, this is exactly what the problem is. If there was merkle trees for each post instead of the whole text in the records, it would stay fast until it gets enormous. But if all those posts are strung together willy nilly alongside other, fixed-sized data blocks, it's gonna have a pretty serious problem with indexing and exploding addressing requirements. The indexes are probably designed for graph data blocks which probably are 128/256/512byte page tables. This is gonna clog up the allocation table really fast putting 1-4kb blobs in there, you see what I mean?

I'm not native english speaker nor a professional in any way at computer science so I don't understand all the details and terms you're trying to explain and use. This is something I'll try to fix with further education and I can always come back to this answer to fully understand it at some point in time.

I did however understand(or did I?) that it's not wise to store every post as plain text data and it rather should be coded into a string that could be searched,processed and stored efficiently.

Thank you for the detailed answer!

If you have been around for a while you may remember back in the day that there was a time you could only put 140 characters in an SMS. Then someone devised a method to string them together (it's a bit like a blockchain, funny enough). Basically, in the database, this is how graphene is probably storing the posts, as strings of 256 bytes, normally large enough for any graph data.

This is not a sustainable method of storing large blocks of data, not least of all because it's going to walk all over the block size boundaries.

You may have bumped into a problem of a similar nature with the FAT32 filesystem, and a large video file over 4Gb in size. The filesystem cannot do this because it hasn't got enough space to store the map of where all the pieces go.

Interesting. So majority of people isn't informed enough to see this problem at all and those who know aren't voicing it.

It's starting to feel Steem was just hastily put together using tools available and without that much thinking for the long term (+5 years and beyond).

Doesn't really make me want to put money to EOS at all.

How familiar are you with Ethereum? I'd like to hear your view on it.

Hehe... Pickin the ol' noggin...

Ok, ethereum uses merkle trees extensively, what this means is a bit like the graph database issue - it's only intended to store small blocks of information, and then it has secondary stores to store other things. This is the essential underlying principle of Ethereum's data storage strategy.

In these larger blob stores there can be virtual machine binary code for executing smart contracts, or, any kind of data at all, basically.

The issue that I see happening in the long term with Ethereum has to do with the blockchain getting stupidly long and there is no pruning method for reducing this problem. I think part of how it will extend-and-pretend for a while is by shifting sidechains into that storage area that is managed by the smart contracts. But this is going to have issues eventually with the cost of processing all this data via a virtual machine, and the inherent limitations compared to native code. I'm sure programmers are already devising idiot schemes how to clog ethereum's transient data stores with so much crap and putting so much load on processing that the Gas currency becomes so expensive that nobody can use it anymore.

Well, I think that's already a looming problem, and I believe you will find somewhere even that Dan Larimer has even pointed this out. I'm pretty sure he's doing the hammer-and-nails thing with his zero transaction fees idea, but probably missing some important point that is gonna bite EOS in the ass... once he's run off to his next thing. Or the bahamas.

I remember reading about pruning and ethereum in a same sentence before, they must be working on it.

Only time will tell, but there's really no returning back for Dan if he leaves EOS shortly after, but at that point he can buy an island or two for himself so it'll be fine.

And yes, Dan always mentions how he can spit out these different projects out fast and then he leaves the rest to fix them or to keep together. Bitshares seems to be holding fine though.

So do you think there's any blockchain that's doing "it" right now or have we yet to see such a project? Will any of these projects be running in 10 years.

Sounds like l need to powerdown and invest in other currencies until @ned invests in better infrastucture.

I don't envy the position @ned is in, one bit. Caught between premined super @dan, and the rest of the preminers, and the ethical and legal position that this puts everyone in, and the fact that they still will bleat 'it's mymymymymymy steak'...

I know, I keep reminding myself this site is a social experiment...thank you @elfspice.

Interesting.

Tic tac toe, Wargames showed us there are no winners :-(

Cg

don't know if I'm "intelligent" enough to answer, let alone "highly", so I'll just "not answer" this...

Haha good one !! Me too !!😂🐈🐱🐈🐱

I see what you did there. Well-played ... or rather, well-not-played lol

ROFL - Only High Intelliweenie Answers. Hahahahahahaha

hahaha.. good one

Haha...made my day with your pic :D