BitShares Block Production Incident ReportsteemCreated with Sketch.

in #bitshares7 years ago

On Monday, July 10th 2017 at 9:00am UTC, an incident occurred on the
BitShares network that caused an unplanned interruption of block
production. All block producers have been affected by a memory
corruption that was caused by an automatic resize of a flat_index
container that resulted in an unrecoverable stale state. This has
happened for the first time in over two years of blockchain operations.

After several core developers debugging the code, the cause was
identified and a patch was quickly delivered to the block producers.
Shortly after that, the blockchain recovered and new blocks have been
generated.

All transaction that made it into the blockchain prior to this incident
are unaltered!

It is absolutely necessary that everyone is aware that the nature of the
patch requires all nodes to apply the patch, accordingly, in order to
sync back with the blockchain. We recommend exchanges and third party
providers to update their back-end to tag 2.0.170710 and rebuild.

Read on

Sort:  

Did you know, you can order an octo-shot of espresso at Starbucks? For events like you mentioned above. I think the lingo would be ...

"Vente Americano with Octo-Shot"

It's not very tasty, but it has the same results as a flux capacitor being struck by lightning at midnight.

So slightly less damaging than knocking back a Pan Galactic Gargle Blaster?

Bazinga!


PGGB
Source


Perhaps you could combine the two.


"One PGGB with an Octo-Shot"

I could use a shot of that.

Great news & thanks for the thorough write up!

Thanks for sharing! Some info and links to your posts were included in the Steem.center wiki page about Bitshares. Thanks and good luck again!

Thanks for sharing. Good post

Great post thanks for sharing.

that's good to know thanks a lot for sharing and keep on posting ;)

Hmm memory corruptions are hard to debug.Wonder how did you trace it? logs ? was it reproducible?

Honestly, I had a gut feeling, tried swapping the index types, and it was fixed. Others then investigated the root cause.

Steem doesn't use this particular index so is safe.

I usually do a tracelog and see atleast which function/method caused corruption.May not work always.. and exception handlers sometimes gives information too. Another thing to do it keep watching stack/globals,buffers etc. You could see unexpected value, which could be an indication of corruption...Which language is this written c++ /c#/go?

Bet that was a late night at the office and many pizzas were consumed.

Thanks go to the backend developers that have focus on identifying and fixing the issue yesterday. Since I am not a C++ dev, I was of only little help.

Still, in order to ensure everyone is on the same page and understands the causes of the issue, I felt the need for a more formal incident report.

Good call.

good post, thanks for sharing.
I just vote up ;)