Community
Participate
Working Groups
org.eclipse.core.internal.utils.FileUtil.transferStreams does basically (lets forget about subMonitor and exception handling) the same as InputStream.transferTo(destination) which was introduced in java 1.9. During JDT build it is mainly called for writing .class files. This happens in org.eclipse.jdt.internal.core.builder.AbstractImageBuilder.writeClassFileContents() The InputStream there is always created as ByteArrayInputStream. For the ByteArrayInputStream the data is already available as byte array. Its just wasted time and memory to split this into 8192 byte chunks and pass them one by one to the OS. Therefore InputStream:transferTo is already overloading transferTo with a fast path. We could use InputStream:transferTo for every InputStream type. That would mean we could not differentiate between read and write errors. But those are unlikely anyway. Or we make a fast-path for ByteArrayInputStream:transferTo - which writes the file in a single call.
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.resources/+/179819
(In reply to Eclipse Genie from comment #1) > New Gerrit change created: > https://git.eclipse.org/r/c/platform/eclipse.platform.resources/+/179819 Jörg, sounds interesting, any performance numbers about this concrete change, e.g. on compilation of a big project? Unfortunately JDK devs didn't documented any performance numbers in https://bugs.openjdk.java.net/browse/JDK-8180451
Do not expect much improvement. Basically we only avoid memcopies of each .class file content and a few Systemcalls per class file (depends on individual size - average 30kb for us). The memcopy is rather fast and the systemcalls highly depend on the OS. On Windows you might even have software which intercept every systemcall but currently antivirus mainly intercepts the open and close calls. I do not know any user process which intercepts the write (processmonitor does tough) . Thus this is only a very small fraction (<0.5%) improvement on windows. It is way below my jitter of a full build to give accurate numbers.
Do you have concrete metrics about the actual gain? The suggested approach in the patch makes it loose progress report and cancellation capabilities. I think those can be important in many cases, so if performance savings is not so sensible, it's probably not worth dropping progress/cancel.
Here is the result of a jmh microbenchmark overriding the same file over and over on Windows: Benchmark (fileName) (size) Mode Cnt Score Error Units FileWrite.eclipse output.tmp 1000 avgt 100 150,156 ± 3,900 us/op FileWrite.eclipse output.tmp 5000 avgt 100 150,711 ± 3,981 us/op FileWrite.eclipse output.tmp 10000 avgt 100 180,716 ± 6,457 us/op FileWrite.eclipse output.tmp 50000 avgt 100 232,153 ± 9,137 us/op FileWrite.eclipse output.tmp 100000 avgt 100 288,584 ± 10,437 us/op FileWrite.eclipse output.tmp 500000 avgt 100 591,598 ± 23,250 us/op FileWrite.transferTo output.tmp 1000 avgt 100 151,048 ± 5,271 us/op FileWrite.transferTo output.tmp 5000 avgt 100 151,052 ± 3,695 us/op FileWrite.transferTo output.tmp 10000 avgt 100 153,524 ± 3,915 us/op FileWrite.transferTo output.tmp 50000 avgt 100 163,021 ± 3,958 us/op FileWrite.transferTo output.tmp 100000 avgt 100 173,232 ± 3,424 us/op FileWrite.transferTo output.tmp 500000 avgt 100 281,999 ± 7,042 us/op (fileName) (size) eclipse transferTo output.tmp 1000 +150,156 151,048 output.tmp 5000 +150,711 151,052 output.tmp 10000 180,716 +153,524 output.tmp 50000 232,153 +163,021 output.tmp 100000 288,584 +173,232 output.tmp 500000 591,598 +281,999 As you can see there is no difference for files <8192 bytes (expected - fits in single buffer). While for big files it tends to double the performance (expected - halfed the mem copys).
Created attachment 286246 [details] jmh Benchmark: FileWrite.java
Here results on RHEL 7.4 Linux / SSD: Benchmark (fileName) (size) Mode Cnt Score Error Units FileWrite.eclipse output.tmp 1000 avgt 100 152.987 ? 3.587 us/op FileWrite.eclipse output.tmp 5000 avgt 100 134.310 ? 1.045 us/op FileWrite.eclipse output.tmp 10000 avgt 100 132.026 ? 0.905 us/op FileWrite.eclipse output.tmp 50000 avgt 100 155.583 ? 0.894 us/op FileWrite.eclipse output.tmp 100000 avgt 100 169.201 ? 0.745 us/op FileWrite.eclipse output.tmp 500000 avgt 100 571.079 ? 2.829 us/op FileWrite.transferTo output.tmp 1000 avgt 100 153.410 ? 3.930 us/op FileWrite.transferTo output.tmp 5000 avgt 100 134.692 ? 0.996 us/op FileWrite.transferTo output.tmp 10000 avgt 100 130.628 ? 0.616 us/op FileWrite.transferTo output.tmp 50000 avgt 100 145.742 ? 0.580 us/op FileWrite.transferTo output.tmp 100000 avgt 100 157.709 ? 0.753 us/op FileWrite.transferTo output.tmp 500000 avgt 100 534.186 ? 3.061 us/op Second run was similar, except last test had outliner: Benchmark (fileName) (size) Mode Cnt Score Error Units FileWrite.eclipse output.tmp 1000 avgt 100 158.036 ? 3.992 us/op FileWrite.eclipse output.tmp 5000 avgt 100 138.430 ? 1.931 us/op FileWrite.eclipse output.tmp 10000 avgt 100 132.114 ? 0.729 us/op FileWrite.eclipse output.tmp 50000 avgt 100 153.693 ? 0.623 us/op FileWrite.eclipse output.tmp 100000 avgt 100 170.906 ? 0.614 us/op FileWrite.eclipse output.tmp 500000 avgt 100 583.654 ? 8.434 us/op FileWrite.transferTo output.tmp 1000 avgt 100 152.925 ? 3.677 us/op FileWrite.transferTo output.tmp 5000 avgt 100 134.197 ? 1.028 us/op FileWrite.transferTo output.tmp 10000 avgt 100 129.368 ? 0.747 us/op FileWrite.transferTo output.tmp 50000 avgt 100 146.116 ? 0.640 us/op FileWrite.transferTo output.tmp 100000 avgt 100 159.470 ? 0.586 us/op FileWrite.transferTo output.tmp 500000 avgt 100 641.456 ? 44.236 us/op RHEL 7.4 Linux / NFS: Benchmark (fileName) (size) Mode Cnt Score Error Units FileWrite.eclipse output.tmp 1000 avgt 100 10595.632 ? 6484.240 us/op FileWrite.eclipse output.tmp 5000 avgt 100 6534.132 ? 303.272 us/op FileWrite.eclipse output.tmp 10000 avgt 100 7363.477 ? 463.539 us/op FileWrite.eclipse output.tmp 50000 avgt 100 7681.482 ? 308.789 us/op FileWrite.eclipse output.tmp 100000 avgt 100 8343.355 ? 326.838 us/op FileWrite.eclipse output.tmp 500000 avgt 100 13462.352 ? 276.129 us/op FileWrite.transferTo output.tmp 1000 avgt 100 8027.056 ? 741.649 us/op FileWrite.transferTo output.tmp 5000 avgt 100 6697.516 ? 273.112 us/op FileWrite.transferTo output.tmp 10000 avgt 100 7011.694 ? 278.024 us/op FileWrite.transferTo output.tmp 50000 avgt 100 8391.153 ? 357.520 us/op FileWrite.transferTo output.tmp 100000 avgt 100 10034.604 ? 352.960 us/op FileWrite.transferTo output.tmp 500000 avgt 100 13839.217 ? 331.240 us/op Second run (note completely different times that I can't explain except some network issue during first test run): Benchmark (fileName) (size) Mode Cnt Score Error Units FileWrite.eclipse output.tmp 1000 avgt 100 2176.100 ? 648.663 us/op FileWrite.eclipse output.tmp 5000 avgt 100 2154.793 ? 218.057 us/op FileWrite.eclipse output.tmp 10000 avgt 100 2843.507 ? 184.317 us/op FileWrite.eclipse output.tmp 50000 avgt 100 3009.489 ? 101.146 us/op FileWrite.eclipse output.tmp 100000 avgt 100 3827.846 ? 335.660 us/op FileWrite.eclipse output.tmp 500000 avgt 100 10243.323 ? 645.497 us/op FileWrite.transferTo output.tmp 1000 avgt 100 5543.271 ? 749.360 us/op FileWrite.transferTo output.tmp 5000 avgt 100 5965.475 ? 214.695 us/op FileWrite.transferTo output.tmp 10000 avgt 100 6236.220 ? 175.638 us/op FileWrite.transferTo output.tmp 50000 avgt 100 7009.276 ? 355.022 us/op FileWrite.transferTo output.tmp 100000 avgt 100 9749.134 ? 302.703 us/op FileWrite.transferTo output.tmp 500000 avgt 100 13620.514 ? 349.497 us/op Interestingly, first run *always* has a way higher error. I've switched test order to see if that changes results (not really): SSD Benchmark (fileName) (size) Mode Cnt Score Error Units FileWrite.transferTo output.tmp 1000 avgt 100 152.789 ? 3.585 us/op FileWrite.transferTo output.tmp 5000 avgt 100 133.510 ? 1.120 us/op FileWrite.transferTo output.tmp 10000 avgt 100 130.377 ? 0.919 us/op FileWrite.transferTo output.tmp 50000 avgt 100 115.034 ? 0.479 us/op FileWrite.transferTo output.tmp 100000 avgt 100 159.012 ? 1.999 us/op FileWrite.transferTo output.tmp 500000 avgt 100 531.274 ? 7.118 us/op FileWrite.xeclipse output.tmp 1000 avgt 100 155.874 ? 4.078 us/op FileWrite.xeclipse output.tmp 5000 avgt 100 135.581 ? 1.368 us/op FileWrite.xeclipse output.tmp 10000 avgt 100 132.308 ? 1.105 us/op FileWrite.xeclipse output.tmp 50000 avgt 100 123.894 ? 0.671 us/op FileWrite.xeclipse output.tmp 100000 avgt 100 167.885 ? 0.518 us/op FileWrite.xeclipse output.tmp 500000 avgt 100 602.081 ? 28.976 us/op NFS Benchmark (fileName) (size) Mode Cnt Score Error Units FileWrite.transferTo output.tmp 1000 avgt 100 8013.726 ? 571.061 us/op FileWrite.transferTo output.tmp 5000 avgt 100 6387.713 ? 216.183 us/op FileWrite.transferTo output.tmp 10000 avgt 100 5724.620 ? 223.427 us/op FileWrite.transferTo output.tmp 50000 avgt 100 6707.729 ? 187.233 us/op FileWrite.transferTo output.tmp 100000 avgt 100 8201.561 ? 274.431 us/op FileWrite.transferTo output.tmp 500000 avgt 100 13361.642 ? 315.396 us/op FileWrite.xeclipse output.tmp 1000 avgt 100 7995.566 ? 629.670 us/op FileWrite.xeclipse output.tmp 5000 avgt 100 7105.677 ? 238.665 us/op FileWrite.xeclipse output.tmp 10000 avgt 100 3416.438 ? 661.511 us/op FileWrite.xeclipse output.tmp 50000 avgt 100 2950.338 ? 98.145 us/op FileWrite.xeclipse output.tmp 100000 avgt 100 3900.387 ? 119.009 us/op FileWrite.xeclipse output.tmp 500000 avgt 100 8481.613 ? 157.885 us/op So on Linux/NFS the patch is faster for very big files and slightly slower for "usual", on Linux/SSD slightly slower overall as original code...
(In reply to Andrey Loskutov from comment #7) > So on Linux/NFS the patch is faster for very big files and slightly slower > for "usual", on Linux/SSD slightly slower overall as original code... Thanks for testing. On NFS the measured error is way to big to make any conclusion (or one could also say writing 5000 bytes is faster then writing 1000 bytes - lets make a minimal file size ;-) ). On SSD i can not follow your observation. When i put your numbers in columns and mark the better results with a "+": eclipse transferTo +152.987 153.410 +134.310 134.692 132.026 +130.628 155.583 +145.742 169.201 +157.709 571.079 +534.186 eclipse transferTo 158.036 +152.925 138.430 +134.197 132.114 +129.368 153.693 +146.116 170.906 +159.470 +583.654 641.456 xeclipse transferTo 155.874 +152.789 135.581 +133.510 132.308 +130.377 123.894 +115.034 167.885 +159.012 602.081 +531.274 i see far more "+" in the right column. Nevertheless less obvious then on my measurements. What i find most surprising is that your linux system is as slow as my windows. Are you running linux in a windows VM? My colleagues reported file access on their linux@ext4 is much faster for small files then windows. What filesystem did you use? Even on ssd writing 1000 bytes was slower then writing 5000. thats crazy. how comes?
Jörg, I understand the "score" numbers as "higher is better". I'm on real Linux workstation, however we use xfs, no idea why.
(In reply to Andrey Loskutov from comment #9) > Jörg, I understand the "score" numbers as "higher is better". > I'm on real Linux workstation, however we use xfs, no idea why. "score" is misleading. the numbers are average times. less time is better. See Column Mode="avgt". That the opposite of throughput which jmh would also call "score"
(In reply to Andrey Loskutov from comment #9) > however we use xfs, no idea why. https://unix.stackexchange.com/questions/28756/what-is-the-most-high-performance-linux-filesystem-for-storing-a-lot-of-small-fi states old versions of xfs where bad for small files.
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.resources/+/181508
Gerrit change https://git.eclipse.org/r/c/platform/eclipse.platform.resources/+/179819 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/?id=ebec8bb29e69c139d81abb93f566c83411e745b1