Community
Participate
Working Groups
We have been seeing errors when builds are contacting download.eclipse.org. Before, they were rare but they are now much more frequent. It seems that it happens more in the 4AM-10AM time frame. For example, a Trace Compass build: 10:09:12 [ERROR] Failed to resolve target definition tracecompass-e4.5.target: Failed to load p2 metadata repository from location http://download.eclipse.org/tools/cdt/releases/8.8.1/: Communication with repository at http://download.eclipse.org/tools/cdt/releases/8.8.1 failed. Read timed out -> [Help 1] Another Trace Compass build: 04:03:51 [ERROR] Failed to resolve target definition /jobs/genie.tracecompass/tracecompass-master-nightly/workspace/releng/org.eclipse.tracecompass.target/tracecompass-e4.6.target: Failed to load p2 metadata repository from location http://download.eclipse.org/tools/cdt/builds/neon/milestones/: HTTP Server 'Bad Gateway' : http://download.eclipse.org/tools/cdt/builds/neon/milestones/content.xml: HttpComponents connection error response code 502. -> [Help 1] A CDT build: [ERROR] Failed to resolve target definition /jobs/genie.cdt/cdt-verify/workspace/releng/org.eclipse.cdt.target/cdt.target: Failed to load p2 metadata repository from location http://download.eclipse.org/tm/updates/4.0/: HTTP Server 'Bad Gateway' : http://download.eclipse.org/tm/updates/4.0/content.xml: HttpComponents connection error response code 502. -> [Help 1] [ERROR] We strive to achieve stable and reliable builds and this is one of the causes of instability. It would be great it the situation could be improved.
*** Bug 492097 has been marked as a duplicate of this bug. ***
Two of our builds failed similarly at 4AM this morning. Just adding some data points to the bug.
We're seeing again the same kind of failures in Sirius (as reported initialy in bug 492097): for example https://hudson.eclipse.org/sirius/view/active/job/sirius-master/PLATFORM=neon,jdk=JDK-1.8.0/1583/console ERROR] Failed to resolve target definition /jobs/genie.sirius/sirius-master/workspace/PLATFORM/neon/jdk/JDK-1.8.0/packaging/org.eclipse.sirius.parent/../../releng/org.eclipse.sirius.targets/./sirius_neon.target: Failed to load p2 metadata repository from location http://download.eclipse.org/sirius/updates/nightly/latest/neon/incubation/: HTTP Server 'Bad Gateway' : http://download.eclipse.org/sirius/updates/nightly/latest/neon/incubation/content.xml: HttpComponents connection error response code 502. -> [Help 1] Note that the content.xml file mentioned here does not exist, as the http://download.eclipse.org/sirius/updates/nightly/latest/neon/incubation/ repo is a composite p2 repo. My understanding is that the HTTP server should fail quickly with a 404 in such a case, letting p2 try again with the correct compositeContent.xml. The 502 error returned instead aborts the whole build.
> My understanding is that the HTTP server should > fail quickly with a 404 in such a case, letting p2 try again with the > correct compositeContent.xml. The 502 error returned instead aborts the > whole build. That should be the case. However, many years ago we changed the way our 404 is handled, as we are serving in excess of 14M 404's per day. That's an average of 164 404's per second, and peak times exceed 200 404's per second. We don't want to return a beautiful 13K web page as most are Java clients, so they get a 13 byte "404 Not Found" response. The 50x codes we're seeing today is a manifestation of the 404 handler being overwhelmed. We do have new hardware on the way to help fix the problem.
*** This bug has been marked as a duplicate of bug 487915 ***
(In reply to Denis Roy from comment #4) > > My understanding is that the HTTP server should > > fail quickly with a 404 in such a case, letting p2 try again with the > > correct compositeContent.xml. The 502 error returned instead aborts the > > whole build. > > > That should be the case. However, many years ago we changed the way our 404 > is handled, as we are serving in excess of 14M 404's per day. That's an > average of 164 404's per second, and peak times exceed 200 404's per second. OK, thanks for the explanation. Does this mean adding explicit p2.index files in our repos would help reduce the load? The Sirius repos represent a tiny drop in the global load issue, but maybe it would keep us out of the problematic path?
Reducing the amount of 404s we serve would definitely help.