Bug 106981 - Improve performance of GC.drawImage for images with alpha values
Summary: Improve performance of GC.drawImage for images with alpha values
Status: REOPENED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 3.2   Edit
Hardware: PC Windows XP
: P3 normal with 2 votes (vote)
Target Milestone: ---   Edit
Assignee: Silenio Quarti CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2005-08-14 20:35 EDT by Florian Priester CLA
Modified: 2019-09-06 16:13 EDT (History)
5 users (show)

See Also:


Attachments
AlphaBenchmark v2.0 (4.12 KB, text/plain)
2005-09-23 08:44 EDT, Florian Priester CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Priester CLA 2005-08-14 20:35:43 EDT
SWT-win32, 3.2M1

While drawing images with an alpha channel on a GC, I found the performance
a bit puzzling in some places.

Please see the snippet below for a simple benchmark which draws two images
of the same size (one with alpha values set, the other without) onto a
shell (first test) and a destination image (second test), 1000 times each.

Some results on my system (WinXP Pro SP2 on a P4-2.4GHz, JRE 1.4.2_08):

  [display depth=16, image size=32x32, passes=100, times per pass=1000]
    shell test...
      average time without alpha=21.88ms
      average time with    alpha=209.68ms
      factor=9.583181
    image test...
      average time without alpha=16.9ms
      average time with    alpha=207.0ms
      factor=12.248521
  
  [display depth=16, image size=128x128, passes=100, times per pass=1000]
    shell test...
      average time without alpha=73.59ms
      average time with    alpha=2380.01ms
      factor=32.341488
    image test...
      average time without alpha=59.23ms
      average time with    alpha=2437.96ms
      factor=41.160896

  [display depth=32, image size=32x32, passes=100, times per pass=1000]
    shell test...
      average time without alpha=18.31ms
      average time with    alpha=151.85ms
      factor=8.2932825
    image test...
      average time without alpha=17.17ms
      average time with    alpha=288.76ms
      factor=16.817705

  [display depth=32, image size=128x128, passes=100, times per pass=1000]
    shell test...
      average time without alpha=90.3ms
      average time with    alpha=1557.52ms
      factor=17.248283
    image test...
      average time without alpha=90.93ms
      average time with    alpha=3087.67ms
      factor=33.95656

Things to note:
- Alpha blending is expected to have a higher cost in terms of
  processor cycles than simply blitting opaque pixels, but these
  numbers (16, 32, 40 times slower?) seem out of proportion 
- The cost factor of alpha vs. non-alpha varies wildly with the
  image size instead of being constant
- With a display depth of 32-bit, drawing onto an image appears to be
  about twice as slow as drawing onto a control for the alpha case
  (but not the non-alpha case)

Would you have any thoughts about these results and might it be possible
to improve the numbers?

(Feel free to close this report if you feel that my expectations are
 wrong or that the problem is likely to lie with my system setup.)

---

import org.eclipse.swt.graphics.*;
import org.eclipse.swt.widgets.*;

import java.util.*;

public class AlphaBenchmark {
  private static final int WIDTH          = 32;
  private static final int HEIGHT         = WIDTH;
  private static final int PASSES         = 100;
  private static final int TIMES_PER_PASS = 1000;
  
  public static void main(String[] args) {
    // create display
    Display display = new Display();
    
    System.out.print("  [display depth=" + display.getDepth());
    System.out.print(", image size=" + WIDTH + "x" + HEIGHT);
    System.out.print(", passes=" + PASSES);
    System.out.print(", times per pass=" + TIMES_PER_PASS);
    System.out.println("]");
    
    // get sample image data
    Image img = new Image(display, WIDTH, HEIGHT);
    
    ImageData noAlphaImgData = img.getImageData();
    ImageData alphaImgData   = img.getImageData();
    
    img.dispose();
    
    // set alpha values
    byte[] alphas = new byte[WIDTH * HEIGHT];
    Arrays.fill(alphas, (byte) 127);
    alphaImgData.setAlphas(0, 0, alphas.length, alphas, 0);
    
    // create images
    Image noAlphaImg = new Image(display, noAlphaImgData);
    Image alphaImg   = new Image(display, alphaImgData);
    
    // shell test
    Shell destShell = new Shell(display);
    destShell.setBounds(500, 200, 200, 200);
    destShell.open();
    
    System.out.println("    shell test...");
    runTest(destShell, noAlphaImg, alphaImg);
    
    destShell.dispose();
    
    // image test
    Image destImg = new Image(display, WIDTH, HEIGHT);
    
    System.out.println("    image test...");
    runTest(destImg, noAlphaImg, alphaImg);
    
    destImg.dispose();
    
    // dispose images
    noAlphaImg.dispose();
    alphaImg  .dispose();
    
    // dispose display
    display.dispose();
  }
  
  private static void runTest(
      Drawable destDrawable,
      Image    noAlphaImg,
      Image    alphaImg) {
    int noAlphaTotal = 0;
    int alphaTotal   = 0;
    
    // create GC
    GC gc = new GC(destDrawable);
    
    // drawing loop
    for (int pass = 0; pass < PASSES; pass++) {
      noAlphaTotal += draw(gc, noAlphaImg);
      alphaTotal   += draw(gc, alphaImg);
    }
    
    // dispose GC
    gc.dispose();
    
    // display results
    float avgNoAlpha = (float) noAlphaTotal / PASSES;
    float avgAlpha   = (float) alphaTotal   / PASSES;
    
    System.out.println("      average time without alpha=" + avgNoAlpha + "ms");
    System.out.println("      average time with    alpha=" + avgAlpha   + "ms");
    System.out.println("      factor=" + ((float) alphaTotal / noAlphaTotal));
  }
  
  private static int draw(GC gc, Image srcImg) {
    gc.fillRectangle(0, 0, WIDTH, HEIGHT);
    
    long start = System.currentTimeMillis();
    for (int i = 0; i < TIMES_PER_PASS; i++) {
      gc.drawImage(srcImg, 0, 0);
    }
    long end = System.currentTimeMillis();
    
    return (int) (end - start);
  }
}
Comment 1 Silenio Quarti CLA 2005-09-01 14:30:43 EDT
Fixed > 20050901.

Please try the latest.
Comment 2 Florian Priester CLA 2005-09-02 11:14:36 EDT
New benchmark results for swt-N20050902-0010-win32-win32-x86:

Hi-Color
===
  [display depth=16, image size=32x32, passes=100, times per pass=1000]
    shell test...
      average time without alpha=21.43ms
      average time with    alpha=185.61ms
      factor=8.661222
    image test...
      average time without alpha=20.0ms
      average time with    alpha=182.82ms
      factor=9.141

  [display depth=16, image size=128x128, passes=100, times per pass=1000]
    shell test...
      average time without alpha=68.62ms
      average time with    alpha=2041.23ms
      factor=29.746866
    image test...
      average time without alpha=59.06ms
      average time with    alpha=2091.41ms
      factor=35.411613

  [display depth=16, image size=256x256, passes=100, times per pass=1000]
    shell test...
      average time without alpha=212.54ms
      average time with    alpha=8262.93ms
      factor=38.87706
    image test...
      average time without alpha=192.52ms
      average time with    alpha=8580.14ms
      factor=44.567524

True-Color
===
  [display depth=32, image size=32x32, passes=100, times per pass=1000]
    shell test...
      average time without alpha=17.82ms
      average time with    alpha=128.9ms
      factor=7.2334456
    image test...
      average time without alpha=18.3ms
      average time with    alpha=130.14ms
      factor=7.1114755

  [display depth=32, image size=128x128, passes=100, times per pass=1000]
    shell test...
      average time without alpha=90.21ms
      average time with    alpha=1211.19ms
      factor=13.426338
    image test...
      average time without alpha=90.16ms
      average time with    alpha=1218.43ms
      factor=13.514086

  [display depth=32, image size=256x256, passes=100, times per pass=1000]
    shell test...
      average time without alpha=363.47ms
      average time with    alpha=5381.99ms
      factor=14.807247
    image test...
      average time without alpha=364.6ms
      average time with    alpha=5677.12ms
      factor=15.570817

Observations:

All in all, the numbers are better. Most notably, the "penalty" for
drawing onto an image (as opposed to a control) in 32-bit mode is gone.
That's a big step forward.

However, the cost factor of using an alpha channel still seems fairly high
for some cases, especially for bigger image sizes. The idea (hope) would be
to aim for a factor that's somewhere in the range of (estimating here) 4-7,
regardless of the image size.

Looking at the source code, some of the current performance seems to be
due to the overhead that occurs each time GC.drawBitmapAlpha is called.
Among other things, a buffer the size of the image is allocated and after
that, every pixel is set and prepared for subsequent use with AlphaBlend.

If it were possible to cache some of this data, this would surely boost
the painting speed. There are situations in which the ability to (repeatedly)
draw images quickly may be well worth the cost of any additional memory
consumed by cached data. (Note that the images in such a scenario could
be several different ones that are drawn in an arbitrary order.)

Reopening this report as a request for checking whether further optimizations
such as caching are feasible.

Thanks for looking into this issue!
Comment 3 Silenio Quarti CLA 2005-09-06 18:01:46 EDT
These are the results of running the bench on my machine (WinXP pro sp1, P4 
2.0GHz, 1.5 RAM). They are more reasonable than the ones you posted. Probably 
because my display driver performs better.

Anyway, the problem here is that AlphaBlend() takes premultiplied data and 
other APIs like ImageList_Add() take non-premultiplied data which forces us to 
create a temporary DIB section. We have investigated ways of using 
premultiplied data all the time, but it involves a lot of work (and new API 
for getImageData() that describes premultiplied data). I will look at this 
only after the next milestone.

  [display depth=16, image size=32x32, passes=100, times per pass=1000]
    shell test...
      average time without alpha=49.79ms
      average time with    alpha=196.31ms
      factor=3.9427595
    image test...
      average time without alpha=37.78ms
      average time with    alpha=184.4ms
      factor=4.8808894

  [display depth=16, image size=128x128, passes=100, times per pass=1000]
    shell test...
      average time without alpha=261.19ms
      average time with    alpha=1902.41ms
      factor=7.283625
    image test...
      average time without alpha=246.06ms
      average time with    alpha=1890.04ms
      factor=7.681216
  [display depth=32, image size=32x32, passes=100, times per pass=1000]
    shell test...
      average time without alpha=63.92ms
      average time with    alpha=201.24ms
      factor=3.1483104
    image test...
      average time without alpha=64.3ms
      average time with    alpha=184.29ms
      factor=2.8660965
  [display depth=32, image size=128x128, passes=100, times per pass=1000]
    shell test...
      average time without alpha=1272.39ms
      average time with    alpha=1970.11ms
      factor=1.5483539
    image test...
      average time without alpha=1270.92ms
      average time with    alpha=1935.96ms
      factor=1.5232744  [display depth=32, image size=256x256, passes=100, 
times per pass=1000]
    shell test...
      average time without alpha=1497.62ms
      average time with    alpha=8437.22ms
      factor=1.6337523
    image test...
      average time without alpha=5433.39ms
      average time with    alpha=9349.88ms
      factor=1.7208189
Comment 4 Florian Priester CLA 2005-09-23 08:44:06 EDT
Created attachment 27431 [details]
AlphaBenchmark v2.0

Slightly improved benchmark to replace the one from comment #0.
Comment 5 Florian Priester CLA 2005-09-23 08:50:12 EDT
(In reply to comment #3)
> These are the results of running the bench on my machine (WinXP pro sp1, P4
> 2.0GHz, 1.5 RAM). They are more reasonable than the ones you posted. Probably
> because my display driver performs better.

This could certainly be a factor, although I'm using a recent revision of the
standard ATI drivers. To be honest, I find it hard to get any sensible patterns
from comparing your numbers with mine. I still think that my benchmark measures
the right thing (since for a developer who is using SWT, the code chain ends
at the point where he's calling GC.drawImage), but perhaps other approaches
are needed to get a clearer picture.

> Anyway, the problem here is that AlphaBlend() takes premultiplied data and 
> other APIs like ImageList_Add() take non-premultiplied data which forces us to 
> create a temporary DIB section. We have investigated ways of using 
> premultiplied data all the time, but it involves a lot of work (and new API 
> for getImageData() that describes premultiplied data).

If that's possible, that would be great.
Comment 6 Florian Priester CLA 2007-01-29 07:19:15 EST
SWT-win32, 20070129 (HEAD)

Here are some benchmark results (see comment #4) from a different machine.
The numbers are better.

---

Hi-Color:

  [display depth=16, image size=32x32, passes=100]
    shell test...
      1000*drawImage without alpha=14.21ms
      1000*drawImage with    alpha=59.69ms
      factor=4.20
    image test...
      1000*drawImage without alpha=10.80ms
      1000*drawImage with    alpha=58.89ms
      factor=5.45
  [display depth=16, image size=64x64, passes=100]
    shell test...
      1000*drawImage without alpha=25.92ms
      1000*drawImage with    alpha=141.58ms
      factor=5.46
    image test...
      1000*drawImage without alpha=17.96ms
      1000*drawImage with    alpha=141.72ms
      factor=7.89
  [display depth=16, image size=128x128, passes=100]
    shell test...
      1000*drawImage without alpha=69.07ms
      1000*drawImage with    alpha=471.24ms
      factor=6.82
    image test...
      1000*drawImage without alpha=53.31ms
      1000*drawImage with    alpha=471.38ms
      factor=8.84
  [display depth=16, image size=256x256, passes=100]
    shell test...
      1000*drawImage without alpha=223.29ms
      1000*drawImage with    alpha=1847.49ms
      factor=8.27
    image test...
      1000*drawImage without alpha=193.79ms
      1000*drawImage with    alpha=1847.77ms
      factor=9.53

True-Color:

  [display depth=32, image size=32x32, passes=100]
    shell test...
      1000*drawImage without alpha=10.64ms
      1000*drawImage with    alpha=55.14ms
      factor=5.18
    image test...
      1000*drawImage without alpha=9.99ms
      1000*drawImage with    alpha=55.48ms
      factor=5.55
  [display depth=32, image size=64x64, passes=100]
    shell test...
      1000*drawImage without alpha=16.72ms
      1000*drawImage with    alpha=126.25ms
      factor=7.55
    image test...
      1000*drawImage without alpha=16.70ms
      1000*drawImage with    alpha=125.64ms
      factor=7.52
  [display depth=32, image size=128x128, passes=100]
    shell test...
      1000*drawImage without alpha=70.47ms
      1000*drawImage with    alpha=414.68ms
      factor=5.88
    image test...
      1000*drawImage without alpha=70.94ms
      1000*drawImage with    alpha=414.38ms
      factor=5.84
  [display depth=32, image size=256x256, passes=100]
    shell test...
      1000*drawImage without alpha=281.57ms
      1000*drawImage with    alpha=1657.33ms
      factor=5.89
    image test...
      1000*drawImage without alpha=281.56ms
      1000*drawImage with    alpha=1653.60ms
      factor=5.87
Comment 7 Eclipse Webmaster CLA 2019-09-06 16:13:33 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.