Community
Participate
Working Groups
SWT-win32, 3.2M1 While drawing images with an alpha channel on a GC, I found the performance a bit puzzling in some places. Please see the snippet below for a simple benchmark which draws two images of the same size (one with alpha values set, the other without) onto a shell (first test) and a destination image (second test), 1000 times each. Some results on my system (WinXP Pro SP2 on a P4-2.4GHz, JRE 1.4.2_08): [display depth=16, image size=32x32, passes=100, times per pass=1000] shell test... average time without alpha=21.88ms average time with alpha=209.68ms factor=9.583181 image test... average time without alpha=16.9ms average time with alpha=207.0ms factor=12.248521 [display depth=16, image size=128x128, passes=100, times per pass=1000] shell test... average time without alpha=73.59ms average time with alpha=2380.01ms factor=32.341488 image test... average time without alpha=59.23ms average time with alpha=2437.96ms factor=41.160896 [display depth=32, image size=32x32, passes=100, times per pass=1000] shell test... average time without alpha=18.31ms average time with alpha=151.85ms factor=8.2932825 image test... average time without alpha=17.17ms average time with alpha=288.76ms factor=16.817705 [display depth=32, image size=128x128, passes=100, times per pass=1000] shell test... average time without alpha=90.3ms average time with alpha=1557.52ms factor=17.248283 image test... average time without alpha=90.93ms average time with alpha=3087.67ms factor=33.95656 Things to note: - Alpha blending is expected to have a higher cost in terms of processor cycles than simply blitting opaque pixels, but these numbers (16, 32, 40 times slower?) seem out of proportion - The cost factor of alpha vs. non-alpha varies wildly with the image size instead of being constant - With a display depth of 32-bit, drawing onto an image appears to be about twice as slow as drawing onto a control for the alpha case (but not the non-alpha case) Would you have any thoughts about these results and might it be possible to improve the numbers? (Feel free to close this report if you feel that my expectations are wrong or that the problem is likely to lie with my system setup.) --- import org.eclipse.swt.graphics.*; import org.eclipse.swt.widgets.*; import java.util.*; public class AlphaBenchmark { private static final int WIDTH = 32; private static final int HEIGHT = WIDTH; private static final int PASSES = 100; private static final int TIMES_PER_PASS = 1000; public static void main(String[] args) { // create display Display display = new Display(); System.out.print(" [display depth=" + display.getDepth()); System.out.print(", image size=" + WIDTH + "x" + HEIGHT); System.out.print(", passes=" + PASSES); System.out.print(", times per pass=" + TIMES_PER_PASS); System.out.println("]"); // get sample image data Image img = new Image(display, WIDTH, HEIGHT); ImageData noAlphaImgData = img.getImageData(); ImageData alphaImgData = img.getImageData(); img.dispose(); // set alpha values byte[] alphas = new byte[WIDTH * HEIGHT]; Arrays.fill(alphas, (byte) 127); alphaImgData.setAlphas(0, 0, alphas.length, alphas, 0); // create images Image noAlphaImg = new Image(display, noAlphaImgData); Image alphaImg = new Image(display, alphaImgData); // shell test Shell destShell = new Shell(display); destShell.setBounds(500, 200, 200, 200); destShell.open(); System.out.println(" shell test..."); runTest(destShell, noAlphaImg, alphaImg); destShell.dispose(); // image test Image destImg = new Image(display, WIDTH, HEIGHT); System.out.println(" image test..."); runTest(destImg, noAlphaImg, alphaImg); destImg.dispose(); // dispose images noAlphaImg.dispose(); alphaImg .dispose(); // dispose display display.dispose(); } private static void runTest( Drawable destDrawable, Image noAlphaImg, Image alphaImg) { int noAlphaTotal = 0; int alphaTotal = 0; // create GC GC gc = new GC(destDrawable); // drawing loop for (int pass = 0; pass < PASSES; pass++) { noAlphaTotal += draw(gc, noAlphaImg); alphaTotal += draw(gc, alphaImg); } // dispose GC gc.dispose(); // display results float avgNoAlpha = (float) noAlphaTotal / PASSES; float avgAlpha = (float) alphaTotal / PASSES; System.out.println(" average time without alpha=" + avgNoAlpha + "ms"); System.out.println(" average time with alpha=" + avgAlpha + "ms"); System.out.println(" factor=" + ((float) alphaTotal / noAlphaTotal)); } private static int draw(GC gc, Image srcImg) { gc.fillRectangle(0, 0, WIDTH, HEIGHT); long start = System.currentTimeMillis(); for (int i = 0; i < TIMES_PER_PASS; i++) { gc.drawImage(srcImg, 0, 0); } long end = System.currentTimeMillis(); return (int) (end - start); } }
Fixed > 20050901. Please try the latest.
New benchmark results for swt-N20050902-0010-win32-win32-x86: Hi-Color === [display depth=16, image size=32x32, passes=100, times per pass=1000] shell test... average time without alpha=21.43ms average time with alpha=185.61ms factor=8.661222 image test... average time without alpha=20.0ms average time with alpha=182.82ms factor=9.141 [display depth=16, image size=128x128, passes=100, times per pass=1000] shell test... average time without alpha=68.62ms average time with alpha=2041.23ms factor=29.746866 image test... average time without alpha=59.06ms average time with alpha=2091.41ms factor=35.411613 [display depth=16, image size=256x256, passes=100, times per pass=1000] shell test... average time without alpha=212.54ms average time with alpha=8262.93ms factor=38.87706 image test... average time without alpha=192.52ms average time with alpha=8580.14ms factor=44.567524 True-Color === [display depth=32, image size=32x32, passes=100, times per pass=1000] shell test... average time without alpha=17.82ms average time with alpha=128.9ms factor=7.2334456 image test... average time without alpha=18.3ms average time with alpha=130.14ms factor=7.1114755 [display depth=32, image size=128x128, passes=100, times per pass=1000] shell test... average time without alpha=90.21ms average time with alpha=1211.19ms factor=13.426338 image test... average time without alpha=90.16ms average time with alpha=1218.43ms factor=13.514086 [display depth=32, image size=256x256, passes=100, times per pass=1000] shell test... average time without alpha=363.47ms average time with alpha=5381.99ms factor=14.807247 image test... average time without alpha=364.6ms average time with alpha=5677.12ms factor=15.570817 Observations: All in all, the numbers are better. Most notably, the "penalty" for drawing onto an image (as opposed to a control) in 32-bit mode is gone. That's a big step forward. However, the cost factor of using an alpha channel still seems fairly high for some cases, especially for bigger image sizes. The idea (hope) would be to aim for a factor that's somewhere in the range of (estimating here) 4-7, regardless of the image size. Looking at the source code, some of the current performance seems to be due to the overhead that occurs each time GC.drawBitmapAlpha is called. Among other things, a buffer the size of the image is allocated and after that, every pixel is set and prepared for subsequent use with AlphaBlend. If it were possible to cache some of this data, this would surely boost the painting speed. There are situations in which the ability to (repeatedly) draw images quickly may be well worth the cost of any additional memory consumed by cached data. (Note that the images in such a scenario could be several different ones that are drawn in an arbitrary order.) Reopening this report as a request for checking whether further optimizations such as caching are feasible. Thanks for looking into this issue!
These are the results of running the bench on my machine (WinXP pro sp1, P4 2.0GHz, 1.5 RAM). They are more reasonable than the ones you posted. Probably because my display driver performs better. Anyway, the problem here is that AlphaBlend() takes premultiplied data and other APIs like ImageList_Add() take non-premultiplied data which forces us to create a temporary DIB section. We have investigated ways of using premultiplied data all the time, but it involves a lot of work (and new API for getImageData() that describes premultiplied data). I will look at this only after the next milestone. [display depth=16, image size=32x32, passes=100, times per pass=1000] shell test... average time without alpha=49.79ms average time with alpha=196.31ms factor=3.9427595 image test... average time without alpha=37.78ms average time with alpha=184.4ms factor=4.8808894 [display depth=16, image size=128x128, passes=100, times per pass=1000] shell test... average time without alpha=261.19ms average time with alpha=1902.41ms factor=7.283625 image test... average time without alpha=246.06ms average time with alpha=1890.04ms factor=7.681216 [display depth=32, image size=32x32, passes=100, times per pass=1000] shell test... average time without alpha=63.92ms average time with alpha=201.24ms factor=3.1483104 image test... average time without alpha=64.3ms average time with alpha=184.29ms factor=2.8660965 [display depth=32, image size=128x128, passes=100, times per pass=1000] shell test... average time without alpha=1272.39ms average time with alpha=1970.11ms factor=1.5483539 image test... average time without alpha=1270.92ms average time with alpha=1935.96ms factor=1.5232744 [display depth=32, image size=256x256, passes=100, times per pass=1000] shell test... average time without alpha=1497.62ms average time with alpha=8437.22ms factor=1.6337523 image test... average time without alpha=5433.39ms average time with alpha=9349.88ms factor=1.7208189
Created attachment 27431 [details] AlphaBenchmark v2.0 Slightly improved benchmark to replace the one from comment #0.
(In reply to comment #3) > These are the results of running the bench on my machine (WinXP pro sp1, P4 > 2.0GHz, 1.5 RAM). They are more reasonable than the ones you posted. Probably > because my display driver performs better. This could certainly be a factor, although I'm using a recent revision of the standard ATI drivers. To be honest, I find it hard to get any sensible patterns from comparing your numbers with mine. I still think that my benchmark measures the right thing (since for a developer who is using SWT, the code chain ends at the point where he's calling GC.drawImage), but perhaps other approaches are needed to get a clearer picture. > Anyway, the problem here is that AlphaBlend() takes premultiplied data and > other APIs like ImageList_Add() take non-premultiplied data which forces us to > create a temporary DIB section. We have investigated ways of using > premultiplied data all the time, but it involves a lot of work (and new API > for getImageData() that describes premultiplied data). If that's possible, that would be great.
SWT-win32, 20070129 (HEAD) Here are some benchmark results (see comment #4) from a different machine. The numbers are better. --- Hi-Color: [display depth=16, image size=32x32, passes=100] shell test... 1000*drawImage without alpha=14.21ms 1000*drawImage with alpha=59.69ms factor=4.20 image test... 1000*drawImage without alpha=10.80ms 1000*drawImage with alpha=58.89ms factor=5.45 [display depth=16, image size=64x64, passes=100] shell test... 1000*drawImage without alpha=25.92ms 1000*drawImage with alpha=141.58ms factor=5.46 image test... 1000*drawImage without alpha=17.96ms 1000*drawImage with alpha=141.72ms factor=7.89 [display depth=16, image size=128x128, passes=100] shell test... 1000*drawImage without alpha=69.07ms 1000*drawImage with alpha=471.24ms factor=6.82 image test... 1000*drawImage without alpha=53.31ms 1000*drawImage with alpha=471.38ms factor=8.84 [display depth=16, image size=256x256, passes=100] shell test... 1000*drawImage without alpha=223.29ms 1000*drawImage with alpha=1847.49ms factor=8.27 image test... 1000*drawImage without alpha=193.79ms 1000*drawImage with alpha=1847.77ms factor=9.53 True-Color: [display depth=32, image size=32x32, passes=100] shell test... 1000*drawImage without alpha=10.64ms 1000*drawImage with alpha=55.14ms factor=5.18 image test... 1000*drawImage without alpha=9.99ms 1000*drawImage with alpha=55.48ms factor=5.55 [display depth=32, image size=64x64, passes=100] shell test... 1000*drawImage without alpha=16.72ms 1000*drawImage with alpha=126.25ms factor=7.55 image test... 1000*drawImage without alpha=16.70ms 1000*drawImage with alpha=125.64ms factor=7.52 [display depth=32, image size=128x128, passes=100] shell test... 1000*drawImage without alpha=70.47ms 1000*drawImage with alpha=414.68ms factor=5.88 image test... 1000*drawImage without alpha=70.94ms 1000*drawImage with alpha=414.38ms factor=5.84 [display depth=32, image size=256x256, passes=100] shell test... 1000*drawImage without alpha=281.57ms 1000*drawImage with alpha=1657.33ms factor=5.89 image test... 1000*drawImage without alpha=281.56ms 1000*drawImage with alpha=1653.60ms factor=5.87
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.