Windows Azure、11時間にわたる全世界的なストレージ障害。原因はSSL証明書の失効
Windows Azureがクラウド内で利用しているSSL証明書が失効してしまったことで、約11時間にわたるストレージ障害が発生しました。その経緯が「Windows Azure サービス ダッシュボード」で報告されています。Windows Azureは、昨年の2月にもうるう年関連のバグで9時間ダウンしていました。
世界中で一斉にWindows Azureのストレージがアクセス不能に
障害発生が報告されたのは、グリニッジ標準時で2月22日の午後8時44分。Windows Azureが稼働するワールドワイドのデータセンターすべて、ほぼ同様の報告が以下のように行われました。
Feb 22 2013 8:44PM We are experiencing an issue with Storage Worldwide and this is impacting all dependent services. We are actively investigating this issue and working to resolve it as soon as possible. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
この時点でほぼ同時に世界中のWindows Azureデータセンターでストレージにアクセスできなくなる、という事象が発生したはずで、運用監視センターのスタッフは相当緊迫した事態が起きたと感じたのではないでしょうか。
Feb 22 2013 9:30PM We identified that HTTPS operations (SSL transactions) on Storage accounts worldwide are impacted. We are actively investigating this issue and working to resolve it as soon as possible. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
Feb 22 2013 10:15PM We are currently validating the repair steps in our test environment. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
Feb 23 2013 12:15AM We have initiated the recovery on some of the impacted clusters. This is expected to take a few hours. We are also validating faster recovery options. Further updates will be published within 2 hours to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
Feb 23 2013 2:15AM The test deployments on two of the impacted storage clusters are making steady progress. We are evaluating accelerated repair options to mitigate the impact as soon as possible. We expect to finalize the repair steps within 2 hours, at which time we'll be able to provide more details. We apologize for any inconvenience this causes our customers.
Feb 23 2013 4:15AM The test deployments on two of the impacted storage clusters are making steady progress. We finalized the accelerated recovery steps and will execute them on remaining Storage clusters. Further updates will be published within 2 hours to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
Feb 23 2013 5:30AM We executed repair steps to update the SSL certificate and majority of our customers are likely to notice recovery. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.
Feb 23 2013 7:30AM Restoration of Storage service is complete and we validated that SSL traffic has been recovered in the sub-region. We will continue to monitor the health of the service and address any intermittent failures before declaring the sub-region fully recovered. We apologize for any inconvenience this causes our customers.
PR:2013年はフラッシュストレージが本格普及か? その可能性を探る [新野淳一×東京エレクトロンデバイス座談会]