Bug 420260 - org.eclipse.equinox.p2.tests take 1.5h longer on the Mac
Summary: org.eclipse.equinox.p2.tests take 1.5h longer on the Mac
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: 3.10.0 Luna   Edit
Hardware: PC Mac OS X
: P3 major (vote)
Target Milestone: Kepler SR2   Edit
Assignee: Krzysztof Daniel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-10-24 07:30 EDT by Markus Keller CLA
Modified: 2013-11-14 03:15 EST (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Keller CLA 2013-10-24 07:30:58 EDT
http://download.eclipse.org/eclipse/downloads/drops4/I20131022-1300/testresults/html/org.eclipse.equinox.p2.tests_linux.gtk.x86_6.0.html : 874.899s
_macosx.cocoa.x86_5.0.html: 6119.089 s

Offenders are e.g.:

testRelativeRemoveChild	Success		1210.980
testNonLocalRepo	Success		1815.984
testArtifactMirrorToInvalid	Success		606.770
testMetadataMirrorToInvalid	Success		607.038
testArtifactMirrorToInvalid	Success		607.025
testMetadataMirrorToInvalid	Success		606.621
Comment 1 David Williams CLA 2013-10-24 11:43:00 EDT
Can a p2 look at disabling these tests? Assuming they are not fixed today. 
These may be interfering the "Hudson operation" since appear to be network related.
Comment 2 Dani Megert CLA 2013-10-24 12:29:44 EDT
Please either fix this asap or disable the test. It delays our test and overloads Hudson. Thanks.
Comment 3 Markus Keller CLA 2013-10-29 11:03:58 EDT
Still happens in http://download.eclipse.org/eclipse/downloads/drops4/I20131028-2000/testresults/html/org.eclipse.equinox.p2.tests_macosx.cocoa.x86_5.0.html

The 600 + x seconds run time look like the tests contain a hardcoded timeout of 10 minutes (the first two tests use 2* and 3*10min).

The scary part is that the tests still succeed after they failed to complete in 10 minutes.
Comment 4 Krzysztof Daniel CLA 2013-10-31 09:15:26 EDT
I've looked into this (as far as Fedora guy could ;-) ), and the only thing those tests have in common is that they all use invalid URIs (pointing to nowhere).  My guess is that there is somewhere conversion happening (URL to URI, or vice versa) or even comparison, so the test depends on how fast the network layer resolves particular URL.
This means that there is probably  a configuration issue in the infrastructure - f.e. proxy configured to ignore unresolvable URLs. Another point is to which extend P2 should wait for the network layer - I can see some room improvement in tests dealing with in memory repositories, but the attempt to download to an invalid repository  seems to work fine - it *should* attempt to resolve the location. 

Unfortunately I can't do anything more unless I will get access to the mac test machines. It is necessary to rerun those tests (it should be enough to run only those, and gather coredumps to verify where P2 is actually stuck). David, is that possible?

Markus - the test doesn't have to fail even after timeout if I am right. I believe it is just the matter of time that the test needs to figure out that particular URL cannot be resolved.
Comment 5 David Williams CLA 2013-10-31 10:48:59 EDT
(In reply to Krzysztof Daniel from comment #4)

> David, is that possible?

Can be. I've CC'd webmasters to document my approval of giving you temporary access it is comes to that. 

If they don't object, they'll have to setup/send you a temporary password, that would allow you to set up SSH tunnel to that machine, and then use VNC (via your local host tunnel) to connect to test mac 2. In the few times I've done it, I've found it difficult to do ... never sure why, but frequent disconnects slow response, etc. Could be my machine or set up. 

But ... I wonder ... before we do all that ... let me point out we do "shoe horn in" a special set of preferences for "proxies" on the test machines. This is done using Eclipse's "preference" file, org.eclipse.core.net.prefs, and current values are listed below. Perhaps these are no longer required? Perhaps they are currently the wrong values? See 
http://wiki.eclipse.org/Hudson#Configuring_a_proxy_for_the_p2_director

One option to try (after M3 "ships") is to take out that step that sets up the "preferences" ... and see if there is any difference? 

Webmasters, any advise on current "proxy setup" for Eclipse, running on the "mac test 2"? 

Here's the current values we used, based on 
http://wiki.eclipse.org/Hudson#Configuring_a_proxy_for_the_p2_director
= = = = =
eclipse.preferences.version=1
org.eclipse.core.net.hasMigrated=true
proxiesEnabled=true
systemProxiesEnabled=true
nonProxiedHosts=172.30.206.*
proxyData/HTTP/hasAuth=false
proxyData/HTTP/host=proxy.eclipse.org
proxyData/HTTP/port=9898
proxyData/HTTPS/hasAuth=false 
proxyData/HTTPS/host=proxy.eclipse.org
proxyData/HTTPS/port=9898
Comment 6 Krzysztof Daniel CLA 2013-10-31 11:43:50 EDT
(In reply to David Williams from comment #5)

> But ... I wonder ... before we do all that ... let me point out we do "shoe
> horn in" a special set of preferences for "proxies" on the test machines.
> This is done using Eclipse's "preference" file, org.eclipse.core.net.prefs,
> and current values are listed below. Perhaps these are no longer required?
> Perhaps they are currently the wrong values? See 
> http://wiki.eclipse.org/Hudson#Configuring_a_proxy_for_the_p2_director

This is indeed something worth to check. Unfortunately I will not be able to do it before Monday (holidays here).
Comment 7 Markus Keller CLA 2013-10-31 11:49:10 EDT
(In reply to Krzysztof Daniel from comment #4)
> Markus - the test doesn't have to fail even after timeout if I am right. I
> believe it is just the matter of time that the test needs to figure out that
> particular URL cannot be resolved.

Thanks, makes sense.

To see where the test hangs, you can also temporarily instrument the affected TestCase's setUp/tearDown like this (will print a thread dump to System.err
60 seconds after a test started):

protected void setUp() throws Exception {
    TestTimeoutDump.setUp(this, 60);
    super.setUp();
}

protected void tearDown() throws Exception {
    super.tearDown();
    TestTimeoutDump.tearDown();
}

private static class TestTimeoutDump {
    static String fgName;
    static int fgTimeoutSeconds;
    static Thread fgTimer;

    public static void setUp(TestCase testCase, int timeoutSeconds) {
        fgName= testCase.getClass().getName() + "#" + testCase.getName();
        fgTimeoutSeconds= timeoutSeconds;
        fgTimer= new Thread() {
            @Override
            public void run() {
                try {
                    Thread.sleep(fgTimeoutSeconds * 1000);
                } catch (InterruptedException e) {
                    fgTimer= null;
                    return;
                }
                System.err.println("Thread dump " + fgName + " at "
                        + new SimpleDateFormat("yyyy-MM-dd HH:mm:ss Z",
                                Locale.US).format(new Date()) + ":");
                Map<Thread,StackTraceElement[]> s= Thread.getAllStackTraces();
                for (Entry<Thread,StackTraceElement[]> entry : s.entrySet()) {
                    String name= entry.getKey().getName();
                    StackTraceElement[] stack= entry.getValue();
                    Exception exception= new Exception(name);
                    exception.setStackTrace(stack);
                    exception.printStackTrace();
                }
                System.err.flush();
                fgTimer= null;
            }
        };
        fgTimer.start();
    }

    public static void tearDown() {
        if (fgTimer != null) {
            fgTimer.interrupt();
        }
    }
}
Comment 8 Krzysztof Daniel CLA 2013-11-04 07:39:39 EST
I tried today running tests with eclipse proxy (no difference, probably was not able to inject preferences properly). Then I tried to check curl -x proxy.eclipse.org:8989 google.com, but it doesn't work from my location (timeout < 1 minute), but does work from build.eclipse.org. Those symptoms don't match the bug behaviour, so I created  a patch as Markus suggested.
https://git.eclipse.org/r/#/c/18021/

Ian, Pascal,
this patch should go in just for one build to diagnose what's going on. Please release.
Comment 9 Pascal Rapicault CLA 2013-11-04 11:27:53 EST
done
Comment 10 Krzysztof Daniel CLA 2013-11-05 10:39:47 EST
I'm not sure what happened, but the instrumentation did not worked. But I noticed something else present in the log (pasted at the end) - it looks like there is indeed a timeout (Connection to http://foobar.com/abcdefg/p2.index failed on Connection to http://foobar.com refused.).
It would be good to check:
curl -v -x proxy.eclipse.org:9898 http://foobar.com/abcdefg/p2.index 
also, it may be that it takes so much time because -Dorg.eclipse.equinox.p2.transport.ecf.retry is set to high value. Maybe just 1 would be enough (not sure how tests are configured)?

!ENTRY org.eclipse.equinox.p2.transport.ecf 2 0 2013-11-05 02:43:03.465
!MESSAGE Connection to http://foobar.com/abcdefg/p2.index failed on Connection to http://foobar.com refused. Retry attempt 0 started
!STACK 0
org.apache.http.conn.HttpHostConnectException: Connection to http://foobar.com refused
	at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
	at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:151)
	at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:125)
	at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at org.eclipse.ecf.provider.filetransfer.httpclient4.HttpClientRetrieveFileTransfer.performConnect(HttpClientRetrieveFileTransfer.java:1074)
	at org.eclipse.ecf.provider.filetransfer.httpclient4.HttpClientRetrieveFileTransfer.openStreams(HttpClientRetrieveFileTransfer.java:621)
	at org.eclipse.ecf.provider.filetransfer.retrieve.AbstractRetrieveFileTransfer.sendRetrieveRequest(AbstractRetrieveFileTransfer.java:879)
	at org.eclipse.ecf.provider.filetransfer.retrieve.AbstractRetrieveFileTransfer.sendRetrieveRequest(AbstractRetrieveFileTransfer.java:570)
	at org.eclipse.ecf.provider.filetransfer.retrieve.MultiProtocolRetrieveAdapter.sendRetrieveRequest(MultiProtocolRetrieveAdapter.java:106)
	at org.eclipse.equinox.internal.p2.transport.ecf.FileReader.sendRetrieveRequest(FileReader.java:422)
	at org.eclipse.equinox.internal.p2.transport.ecf.FileReader.readInto(FileReader.java:355)
	at org.eclipse.equinox.internal.p2.transport.ecf.RepositoryTransport.download(RepositoryTransport.java:101)
	at org.eclipse.equinox.internal.p2.transport.ecf.RepositoryTransport.download(RepositoryTransport.java:156)
	at org.eclipse.equinox.internal.p2.repository.helpers.AbstractRepositoryManager.loadIndexFile(AbstractRepositoryManager.java:735)
	at org.eclipse.equinox.internal.p2.repository.helpers.AbstractRepositoryManager.loadRepository(AbstractRepositoryManager.java:657)
	at org.eclipse.equinox.internal.p2.repository.helpers.AbstractRepositoryManager.doCreateRepository(AbstractRepositoryManager.java:274)
	at org.eclipse.equinox.internal.p2.metadata.repository.MetadataRepositoryManager.createRepository(MetadataRepositoryManager.java:41)
	at org.eclipse.equinox.p2.internal.repository.tools.AbstractApplication.initializeDestination(AbstractApplication.java:199)
Caused by: java.net.ConnectException: Operation timed out
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:579)
	at org.eclipse.ecf.internal.provider.filetransfer.httpclient4.ECFHttpClientProtocolSocketFactory.connectSocket(ECFHttpClientProtocolSocketFactory.java:84)
Comment 11 Markus Keller CLA 2013-11-05 11:00:13 EST
The instrumentation did work in N20131104-2000. System.err ends up here:
http://download.eclipse.org/eclipse/downloads/drops4/N20131104-2000/testresults/consolelogs/macosx.cocoa.x86_64_7.0_consolelog.txt

Search for "dump" and see the first hanging main thread here:

     [java] java.lang.Exception: main
     [java] 	at java.net.PlainSocketImpl.socketConnect(Native Method)
     [java] 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
     [java] 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
     [java] 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
     [java] 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
     [java] 	at java.net.Socket.connect(Socket.java:579)
     [java] 	at org.eclipse.ecf.internal.provider.filetransfer.httpclient4.ECFHttpClientProtocolSocketFactory.connectSocket(ECFHttpClientProtocolSocketFactory.java:84)
     [java] 	at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
     [java] 	at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:151)
     [java] 	at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:125)
     [java] 	at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
     [java] 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
     [java] 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
     [java] 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
     [java] 	at org.eclipse.ecf.provider.filetransfer.httpclient4.HttpClientRetrieveFileTransfer.performConnect(HttpClientRetrieveFileTransfer.java:1074)
     [java] 	at org.eclipse.ecf.provider.filetransfer.httpclient4.HttpClientRetrieveFileTransfer.openStreams(HttpClientRetrieveFileTransfer.java:621)
     [java] 	at org.eclipse.ecf.provider.filetransfer.retrieve.AbstractRetrieveFileTransfer.sendRetrieveRequest(AbstractRetrieveFileTransfer.java:879)
     [java] 	at org.eclipse.ecf.provider.filetransfer.retrieve.AbstractRetrieveFileTransfer.sendRetrieveRequest(AbstractRetrieveFileTransfer.java:570)
     [java] 	at org.eclipse.ecf.provider.filetransfer.retrieve.MultiProtocolRetrieveAdapter.sendRetrieveRequest(MultiProtocolRetrieveAdapter.java:106)
     [java] 	at org.eclipse.equinox.internal.p2.transport.ecf.FileReader.sendRetrieveRequest(FileReader.java:422)
     [java] 	at org.eclipse.equinox.internal.p2.transport.ecf.FileReader.readInto(FileReader.java:355)
     [java] 	at org.eclipse.equinox.internal.p2.transport.ecf.RepositoryTransport.download(RepositoryTransport.java:101)
     [java] 	at org.eclipse.equinox.internal.p2.transport.ecf.RepositoryTransport.download(RepositoryTransport.java:156)
     [java] 	at org.eclipse.equinox.internal.p2.repository.helpers.AbstractRepositoryManager.loadIndexFile(AbstractRepositoryManager.java:735)
     [java] 	at org.eclipse.equinox.internal.p2.repository.helpers.AbstractRepositoryManager.loadRepository(AbstractRepositoryManager.java:657)
     [java] 	at org.eclipse.equinox.internal.p2.metadata.repository.MetadataRepositoryManager.loadRepository(MetadataRepositoryManager.java:96)
     [java] 	at org.eclipse.equinox.internal.p2.metadata.repository.MetadataRepositoryManager.loadRepository(MetadataRepositoryManager.java:92)
     [java] 	at org.eclipse.equinox.internal.p2.metadata.repository.CompositeMetadataRepository.addChild(CompositeMetadataRepository.java:166)
     [java] 	at org.eclipse.equinox.internal.p2.metadata.repository.CompositeMetadataRepository.addChild(CompositeMetadataRepository.java:195)
     [java] 	at org.eclipse.equinox.p2.tests.metadata.repository.CompositeMetadataRepositoryTest.testRelativeRemoveChild(CompositeMetadataRepositoryTest.java:666)
Comment 12 Krzysztof Daniel CLA 2013-11-05 11:14:57 EST
(In reply to comment #11)
> The instrumentation did work in N20131104-2000. System.err ends up here:
> http://download.eclipse.org/eclipse/downloads/drops4/N20131104-2000/testresults/consolelogs/macosx.cocoa.x86_64_7.0_consolelog.txt

Thanks! The dump is perfectly in line with my findings from test specific output - an attempt to load p2 index file (also content.xml, and I guess all not-existing locations) time-outs.
Comment 13 Krzysztof Daniel CLA 2013-11-07 06:53:17 EST
Just to make my last comments clear:

P2 hangs while trying to access non-existing URLs until the connection time outs.

It is necessary to invoke "curl -v -x proxy.eclipse.org:9898 http://foobar.com/abcdefg/p2.index" on the Mac (David, can you do that?) to verify that proxy responds in a timely manner. If it doesn't - it's proxy configuration, if it does - I have to dig deeper (and need access to that mac).
Comment 14 David Williams CLA 2013-11-07 17:07:14 EST
(In reply to Krzysztof Daniel from comment #13)
> Just to make my last comments clear:
> 
> P2 hangs while trying to access non-existing URLs until the connection time
> outs.
> 
> It is necessary to invoke "curl -v -x proxy.eclipse.org:9898
> http://foobar.com/abcdefg/p2.index" on the Mac (David, can you do that?) to
> verify that proxy responds in a timely manner. If it doesn't - it's proxy
> configuration, if it does - I have to dig deeper (and need access to that
> mac).

I can (indirectly, via a special hudson job I have) ... and it does respond quickly ... but, not sure if it is the time of response that p2 is expecting ... might cause p2 to "keep trying"? Just guessing. Here's the response:


* About to connect() to proxy proxy.eclipse.org port 9898 (#0)
*   Trying 198.41.30.203...   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0connected
* Connected to proxy.eclipse.org (198.41.30.203) port 9898 (#0)
> GET http://foobar.com/abcdefg/p2.index HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: foobar.com
> Accept: */*
> Proxy-Connection: Keep-Alive
> 
< HTTP/1.1 404 Not Found
< Date: Thu, 07 Nov 2013 22:02:44 GMT
< Server: Apache
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Type: text/html
< Via: 1.1 build.eclipse.org:9898
< Transfer-Encoding: chunked
< 
{ [data not shown]

100  2445    0  2445    0     0   4542      0 --:--:-- --:--:-- --:--:--  4570* Connection #0 to host proxy.eclipse.org left intact

* Closing connection #0
<!-- SHTML Wrapper - 404 Not Found -->
<!DOCTYPE html>
<html>
  <head>
    <title>404 Not Found</title>
    <meta name="revisit-after" content="10">
    <meta name="ROBOTS" content="NOINDEX, NOFOLLOW"> 
    <script type="text/javascript" src="http://cdn.dsultra.com/js/registrar.js"></script>
        <link href="http://cf.bluehost-cdn.com/media/shared/general/homelayout.css" rel="stylesheet" type="text/css">
    <link rel="stylesheet" href="http://cf.bluehost-cdn.com/media/shared/general/_bh/homestyle.css" type="text/css">
    
    <style>
      body{
        margin:25px;
      }
      .t1{
        color:#000000;
        font-family: Arial, Helvetica, sans-serif;
        font-size:14px;
      }
      .t2{
        color:#3d3d3d;
        font-family: Arial, Helvetica, sans-serif;
        font-size:10px;
        white-space:nowrap;
      }
      .icontent {
        width: 1025px;
        height: 700px;
        border: none;
        margin-top: 4px;
        margin-bottom: 4px;
      }
      h1 {
        font-size: 18px;
        display: block;
      }
      h2 {
        font-size: 16px;
      }
      ul {
        margin-left: 34%
      }
    </style>
</head>
<body bgcolor=white>

<div style="border: solid 2px;border-color: #033B73;padding: 0px;width: 1065px;margin: 0 auto;">
  <table cellpadding="0" cellspacing="0" border="0" width="1065">
    <tr>
      <td colspan="2" class="topheading">
        <table cellpadding="0" cellspacing="0" border="0"><tr>
          <td style="padding-left:25px;width:205px;height:90px"><a href="http://www.bluehost.com/"><img src="http://cf.bluehost-cdn.com/media/shared/general/_bh/logo.gif" width="178" height="39" alt="bluehost" border="0"></a></td>
          <td style="text-align:left">Affordable, Reliable<br />Web Hosting Solutions.</td>
        </tr></table>
      </td>
    </tr>
  </table>
    <div style="text-align: center">
      <h1>404 Error File Not Found</h1>
      <h2> The page you are looking for might have been removed, <br />had its name changed, or is temporarily unavailable.</h2>
            <iframe frameborder="0" scrolling="no" src="about:blank" id='ad_frame' class="icontent"></iframe>
      <script type="text/javascript">registrar_frameset({a_id: 115580, drid: "as-drid-2578124767373827", frame: "ad_frame"});</script>
                  <p><a href="http://www.bluehost.com">Web Hosting</a> provided by Bluehost.com</p>
            </div>
  </div>
</body>
</html>
Comment 15 David Williams CLA 2013-11-07 17:16:06 EST
(In reply to David Williams from comment #14)
> (In reply to Krzysztof Daniel from comment #13)
> > Just to make my last comments clear:
> > 
> > P2 hangs while trying to access non-existing URLs until the connection time
> > outs.
> > 
> > It is necessary to invoke "curl -v -x proxy.eclipse.org:9898
> > http://foobar.com/abcdefg/p2.index" on the Mac (David, can you do that?) to
> > verify that proxy responds in a timely manner. If it doesn't - it's proxy
> > configuration, if it does - I have to dig deeper (and need access to that
> > mac).
> 
> I can (indirectly, via a special hudson job I have) ... and it does respond
> quickly ... but, not sure if it is the time of response that p2 is expecting
> ... might cause p2 to "keep trying"? Just guessing. Here's the response:
> 

Response on Linux is very similar ... 'curl' has not been installed on windows. 

+ curl -v -x proxy.eclipse.org:9898 http://foobar.com/abcdefg/p2.index
* About to connect() to proxy proxy.eclipse.org port 9898 (#0)
*   Trying 198.41.30.203... connected
* Connected to proxy.eclipse.org (198.41.30.203) port 9898 (#0)
> GET http://foobar.com/abcdefg/p2.index HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-suse-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8h zlib/1.2.3 libidn/1.10
> Host: foobar.com
> Accept: */*
> Proxy-Connection: Keep-Alive
> 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0< HTTP/1.1 404 Not Found
< Date: Thu, 07 Nov 2013 22:13:36 GMT
< Server: Apache
< Accept-Ranges: bytes
< Vary: Accept-Encoding
< Content-Type: text/html
< Via: 1.1 build.eclipse.org:9898
< Transfer-Encoding: chunked
< 
{ [data not shown]

101  2445    0  2445    0     0   5997      0 --:--:-- --:--:-- --:--:--  6051* Connection #0 to host proxy.eclipse.org left intact

* Closing connection #0
<!-- SHTML Wrapper - 404 Not Found -->
<!DOCTYPE html>
<html>
  <head>
    <title>404 Not Found</title>
    <meta name="revisit-after" content="10">
    <meta name="ROBOTS" content="NOINDEX, NOFOLLOW"> 
    <script type="text/javascript" src="http://cdn.dsultra.com/js/registrar.js"></script>
        <link href="http://cf.bluehost-cdn.com/media/shared/general/homelayout.css" rel="stylesheet" type="text/css">
    <link rel="stylesheet" href="http://cf.bluehost-cdn.com/media/shared/general/_bh/homestyle.css" type="text/css">
    
    <style>
      body{
        margin:25px;
      }
      .t1{
        color:#000000;
        font-family: Arial, Helvetica, sans-serif;
        font-size:14px;
      }
      .t2{
        color:#3d3d3d;
        font-family: Arial, Helvetica, sans-serif;
        font-size:10px;
        white-space:nowrap;
      }
      .icontent {
        width: 1025px;
        height: 700px;
        border: none;
        margin-top: 4px;
        margin-bottom: 4px;
      }
      h1 {
        font-size: 18px;
        display: block;
      }
      h2 {
        font-size: 16px;
      }
      ul {
        margin-left: 34%
      }
    </style>
</head>
<body bgcolor=white>

<div style="border: solid 2px;border-color: #033B73;padding: 0px;width: 1065px;margin: 0 auto;">
  <table cellpadding="0" cellspacing="0" border="0" width="1065">
    <tr>
      <td colspan="2" class="topheading">
        <table cellpadding="0" cellspacing="0" border="0"><tr>
          <td style="padding-left:25px;width:205px;height:90px"><a href="http://www.bluehost.com/"><img src="http://cf.bluehost-cdn.com/media/shared/general/_bh/logo.gif" width="178" height="39" alt="bluehost" border="0"></a></td>
          <td style="text-align:left">Affordable, Reliable<br />Web Hosting Solutions.</td>
        </tr></table>
      </td>
    </tr>
  </table>
    <div style="text-align: center">
      <h1>404 Error File Not Found</h1>
      <h2> The page you are looking for might have been removed, <br />had its name changed, or is temporarily unavailable.</h2>
            <iframe frameborder="0" scrolling="no" src="about:blank" id='ad_frame' class="icontent"></iframe>
      <script type="text/javascript">registrar_frameset({a_id: 115580, drid: "as-drid-2578124767373827", frame: "ad_frame"});</script>
                  <p><a href="http://www.bluehost.com">Web Hosting</a> provided by Bluehost.com</p>
            </div>
  </div>
</body>
</html>
Comment 16 Markus Keller CLA 2013-11-08 06:05:23 EST
http://foobar.com/abcdefg/p2.index must not be contacted by a test. foobar.com is a valid domain that is not under our control. The reachability of an external web server must not influence our tests.

If you need a URL that always results in a 404, then refer to a place in the Equinox homepage, e.g. http://eclipse.org/equinox/not-existing/path/p2.index

If it really has to be outside of the Eclipse foundation network, then at least use something that is more likely to be reliable, such as
   http://example.com/path/that/should/not/exist/p2.index
or http://google.com/path/that/should/not/exist/p2.index
Comment 17 David Williams CLA 2013-11-11 00:19:54 EST
(In reply to Markus Keller from comment #16)
> http://foobar.com/abcdefg/p2.index must not be contacted by a test.
> foobar.com is a valid domain that is not under our control. The reachability
> of an external web server must not influence our tests.
> 
> If you need a URL that always results in a 404, then refer to a place in the
> Equinox homepage, e.g. http://eclipse.org/equinox/not-existing/path/p2.index
> 
> If it really has to be outside of the Eclipse foundation network, then at
> least use something that is more likely to be reliable, such as
>    http://example.com/path/that/should/not/exist/p2.index
> or http://google.com/path/that/should/not/exist/p2.index

Was this a comment to me? Or p2 committers? That is, was "foobar.com" just happened to be named as an example for 'curl', or is that what is really used in p2 tests? 

As an aside ... it was a long time since I entered this bug ... so my memory is fuzzy ... but, there was a similar issue discussed in bug 390392.
Comment 18 Markus Keller CLA 2013-11-11 06:17:52 EST
(In reply to David Williams from comment #17)
> > http://foobar.com/abcdefg/p2.index must not be contacted by a test.
> Was this a comment to me? Or p2 committers?

To p2 committers (assignees of the bug). At least NewMirrorApplicationMetadataTest#testMetadataMirrorToInvalid() contains code
new URI("http://foobar.com/abcdefg")
Comment 19 Krzysztof Daniel CLA 2013-11-12 08:41:55 EST
(In reply to comment #17)
> (In reply to Markus Keller from comment #16)
> > http://foobar.com/abcdefg/p2.index must not be contacted by a test.
> > foobar.com is a valid domain that is not under our control. The reachability
> > of an external web server must not influence our tests.
> >
> As an aside ... it was a long time since I entered this bug ... so my memory is
> fuzzy ... but, there was a similar issue discussed in bug 390392.

Yes, I can confirm that this problem is caused by bug 390392, because my p2 gerrit job is not configured for proxy, and it was taking 2 hours to complete, but after changing the domains as Markus suggested, I got build executed in 30 minutes (with all tests)!

I will release the patch once gerrit completes executing tests once more, and the proxy investigation will be in bug 390392.
Comment 20 Krzysztof Daniel CLA 2013-11-12 10:31:36 EST
Ok, consistent result =< 30 minutes in gerrit. https://hudson.eclipse.org/p2/job/p2-gerrit/46/
Commit 8b9149efcc8f8cbddcecfb491ff6e6168fbf61ec.

Instrumentation reverted https://git.eclipse.org/r/#/c/18162/,
commit f8c80a1e830881cc38f85dde47f2a688d9b777b2
Comment 21 David Williams CLA 2013-11-12 15:46:28 EST
I'm a bit confused ... as usual ... as this bug was about why Mac tests "suddenly" started taking a lot longer on Mac a few months ago. Was something changed? Some tests re-enabled? ... that would be make it explainable by bug 390392? 

Here's another, unrelated, thought ... I know on Windows, we use 
-Djava.net.useSystemProxies=true
which was added in Java 7. 

We use Java 6 on Linux. And, I think we've been using Java 7 on Macs for quite a while, if not since the beginning ... but, I'm wondering if I should add 
-Djava.net.useSystemProxies=true
to the Mac use tests. 

I assuming it would not hurt, so will add in time for tonight's "nightly" build ... unless that complicates some other test fix that is being tried?
Comment 22 Krzysztof Daniel CLA 2013-11-13 03:06:56 EST
(In reply to comment #21)
> I assuming it would not hurt, so will add in time for tonight's "nightly" build
> ... unless that complicates some other test fix that is being tried?

Yes, this will interfere with the domain change in P2 tests.  It looks like http://download.eclipse.org/eclipse/downloads/drops4/N20131112-2000/testresults/html/org.eclipse.equinox.p2.tests_macosx.cocoa.x86_5.0.html was faster than windows tests today.

Failing tests were never run for 4.2.x on Mac.  4.3 had already long running tests. Given that bug 390392 was opened around 4.2.1, and there is no evidence those tests ever worked correctly (fast) after switch to CBI, and I assume they were always failing, and there is no point in going through configuration changes. Bug 390392 is a very plausible cause of the behaviour (considering that tests without proxy are "long").

So, if you have added the -Djava.net.useSystemProxies=true flag to the build, we need to respin the build without it - and verify tests are still "fast"
Comment 23 David Williams CLA 2013-11-13 03:27:22 EST
(In reply to Krzysztof Daniel from comment #22)
> (In reply to comment #21)
> > I assuming it would not hurt, so will add in time for tonight's "nightly" build
> > ... unless that complicates some other test fix that is being tried?
> 
> Yes, this will interfere with the domain change in P2 tests.  It looks like
> http://download.eclipse.org/eclipse/downloads/drops4/N20131112-2000/
> testresults/html/org.eclipse.equinox.p2.tests_macosx.cocoa.x86_5.0.html was
> faster than windows tests today.
> 
> Failing tests were never run for 4.2.x on Mac.  4.3 had already long running
> tests. Given that bug 390392 was opened around 4.2.1, and there is no
> evidence those tests ever worked correctly (fast) after switch to CBI, and I
> assume they were always failing, and there is no point in going through
> configuration changes. Bug 390392 is a very plausible cause of the behaviour
> (considering that tests without proxy are "long").
> 
> So, if you have added the -Djava.net.useSystemProxies=true flag to the
> build, we need to respin the build without it - and verify tests are still
> "fast"

Ok, will remove now and re-run. Keep a copy if you want to compare. (The existing ones will be replaced upon completion).
Comment 24 David Williams CLA 2013-11-13 08:56:44 EST
(In reply to David Williams from comment #23)
> (In reply to Krzysztof Daniel from comment #22)
> > (In reply to comment #21)

> > 
> > So, if you have added the -Djava.net.useSystemProxies=true flag to the
> > build, we need to respin the build without it - and verify tests are still
> > "fast"
> 
> Ok, will remove now and re-run. Keep a copy if you want to compare. (The
> existing ones will be replaced upon completion).

Slightly longer, but easily within noise level. 

4 hr 19 min (with -Djava.net.useSystemProxies=true)
4 hr 23 min (without -Djava.net.useSystemProxies=true)

Both times "back to normal range". 

But, again, can you spell it out for me ... what caused this to change a few months ago? Were some disabled tests re-enabled? Have they now been re-dis-abled? Or changed? 

Perhaps related, I assume if the mac tests were using a "real" third party address, such as "foobar.com" ... have those tests been changed for all platforms. 

Was this "hard to detect" because tests that "timeout" are no longer captured or detected as "timed out"? (If so, I'm thinking that problem deserves higher priority). 

Thank you very much!
Comment 25 Krzysztof Daniel CLA 2013-11-13 09:28:38 EST
I'm afraid I can't answer the question "what caused this to change a few months ago" - there are two issues here:
1. broken P2 tests referring to not existing, but valid and external address. This has been always in place.
2. broken proxy mechanism on Mac (bug 390392) that caused (1) to wait for timeout instead of discovering that the address doesn't exist. This has been in place since the earliest CBI builds I managed to find.

So I believe those tests were always taking that long (at least since moved to CBI). They were just *discovered* recently because of infrastructure hiccup. 

The cause was hard to detect because both the socked timeout and Proxy responding 404 are equal in the meaning - the domain doesn't exists - and this is what tests wanted/expected. The only difference is that proxy can answer fast, but timeout needs to time out, adding unnecessary time to tests execution.

Hence the fix that switches P2 test from foo* to  Eclipse Foundation controlled domains. There is no proxy engaged, and the resolution failure is discovered fast. In other words, tests results are not changed, but their internal flow is (to correct and faster route). 

Tests were changed for all platforms, and I believe it is no longer necessary to set up P2 proxy (a small bonus for releng team ;-) ).

Of course, the question why proxy set up did not work on Mac is another problem. It would fix this issue too, but I agree with Markus' statement that "The reachability of an external web server must not influence our tests.", and because of that I changed P2 tests instead of marking this bug as a duplicate of the one with proxy :-).

I hope I  managed to summarize the issue properly and answer all your questions. If not - I'm on irc :)
Comment 26 Markus Keller CLA 2013-11-13 11:44:52 EST
(In reply to David Williams from comment #24)
> Was this "hard to detect" because tests that "timeout" are no longer
> captured or detected as "timed out"? (If so, I'm thinking that problem
> deserves higher priority). 

The 2h timeout/screenshots implemented in library.xml and EclipseTestRunner are up and running (but not always easy to spot, see bug 210792).

The problem here was that there's an additional timeout in
org.eclipse.ecf.internal.provider.filetransfer.httpclient4.ECFHttpClientProtocolSocketFactory.connectSocket(ECFHttpClientProtocolSocketFactory.java:84)
that delayed the tests a bit, but not enough to hit the 2h limit.

Nothing in the existing test infrastructure would help finding such a problem, and I don't think the manual process by which I found this can be fully automated. But bug 420296 should help make the manual scans quicker.
Comment 27 David Williams CLA 2013-11-13 16:14:27 EST
Judging from the on-going M-build tests from this morning ... can this fix be backported to R4_3_maintenance? 

https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/

The mac-test2 tests are already over 5 hours (and still at it) ... so, thinking the same fix should be made there. (I'll leave up to you, Krzysztof, if you want to open a new bug or reopen this one).
Comment 28 Krzysztof Daniel CLA 2013-11-14 02:52:25 EST
(In reply to comment #27)
> (I'll leave up to you, Krzysztof, if you want
> to open a new bug or reopen this one).

I'm lazy. Reopening.
Comment 29 Krzysztof Daniel CLA 2013-11-14 03:15:29 EST
The fix went into R3_9_maintenance (P2 has no 4.x branch) .
The fix was released earlier into Luna M4.

Marking as resolved.