Thank you Denis for the clarification and the apology. It is really useful for all of us to know and understand what the reasonable expectations are, and that none of the strategic members had reached out (perhaps the community suffered a little from the bystander effect).
I am sorry that your vacation, and many others in the Webmaster team were disrupted. Thank you for bringing everything back online again and we all look forward to the post mortem. I also hope you can reschedule some of the lost vacation time so that the whole community can benefit from you and your team being refreshed.
On 2021-08-04 4:26 a.m., Sebastian
Zarnekow wrote:
I suspect that people are really not
reachable even in case of an infrastructure disaster.
This is incorrect; EF Strategic members can reach out to Infra
staff using SMS text. Not one has done that. I was alerted (by
Mikael) of an outage at 4:09am and was on a computer at 4:14am to
begin assessing the issue.
Despite the shutdowns & vacations, everyone available on the
infra team worked to restore service according to our SLA:
Some items remained "broken" or unavailable for an extended time,
but they are not Tier I items.
We did our best to communicate the current state, but clearly we
could have done better. I'll be authoring a postmortem shortly,
and will put forth recommendations for future events.
I understand apologies do not fix lost productivity, but I do
apologize this happened. It was just about the worst possible type
of outage at the worst possible time.
Denis
I think the best we can do right now is to learn from the
post-mortem and implement mitigations afterwards.
I can only speculate about what's going on but
speculation will get us nowhere. Suffice to say that I
am deeply and fundamentally concerned both by the state
of our infrastructure and even more so by the complete
silence from the Foundation.
I have tried to reach out yesterday to gain more
information, and to suggest that information be posted
to the community, but so far without success...
It would appear to me that we will not be able
to get m2 completed today because I'm not sure any of us
can do a build and promote the results right now.
Certainly I cannot, at this point, do any of my usual
activities for m2, no new version of Oomph, no new
installers, no product catalog updates...
Regards,
Ed
On 04.08.2021 08:52, Christoph Läubrich wrote:
> but I guess there is still a
major problem with eclipse.org
infrastructure.
It would be good to have at least some more information,
the status page says 'A fix has been implemented and we
are monitoring the results' but this is two days old and
still we see massive outages, updatesites are broken, CI
builds are even not running, we get bug reports about
broken functionality, that's really annoying and
frustrating.
Some basic website functionality seem to be restored,
but mailing lists / message sending seem to be still
broken across different services.
I received few random (I guess not all) mails from
bugzilla, but I guess there is still a major problem
with eclipse.org
infrastructure.
All builds we started this morning (5 hours ago
CEST) have failed with
issues trying to reach download.eclipse.org in
one way or another, two
examples of which are: