Canj's trail: 05/01/2010

How to locate the work item in OpenCL?

- Reading Notes of OpenCL Specification version 1.0.

Global ID: A global ID is used to uniquely identify a work-item and is derived from the number

of global work-items specified when executing a kernel. The global ID is a N-dimensional value

that starts at (0, 0, 0). See also Local ID.

Local ID: A local ID specifies a unique work-item ID within a given work-group that is

executing a kernel. The local ID is a N-dimensional value that starts at (0, 0, 0). See also

Global ID.

Work-item: One of a collection of parallel executions of a kernel invoked on a device by a

command. A work-item is executed by one or more processing elements as part of a work-group

executing on a compute unit. A work-item is distinguished from other executions within the

collection by its global ID and local ID.

a single work-item can be uniquely identified by its global ID or by a combination of its local ID and work-group ID.

Each work-item is identifiable in two ways; in terms of a global index, and in terms of a work-group index plus a

local index within a work group.\

下图显示这些id的计算:

uint get_work_dim(); Returns the number of dimensions in use.

size_t get_global_size(uint dimindx);

图中是 Gx * Sx = get_global_size(0); Gy * Sy = get_global_size(1); 一行or一列的总的items数目.

size_t get_gloabl_id (uint dimindx);

例如上图中gx = get_global_id(0); gy = get_global_id(1); 一行or一列中的第几个.

size_t get_local_size(uint dimindx);

图中的 Sx and Sy. 表示一个work group中一行的items数目or 一列的items数目.

size_t get_local_id (uint dimindx);

例如上图中sx = get_local_id(0); sx = get_local_id(1); 表示某一个work group中一行or一列中的第几个.

size_t get_num_groups(uint dimindx);

图中的 Gx or Gy. 表示每行or每列有多少个work group.

size_t get_group_id(uint dimindx);

图中的wx or wy. 表示一行or一列中的第几个work group.

这些build-in function是足够计算几种id的转化了.

global_work_size: points to an array of work_dim unsigned values that describe the number of

global work-items in work_dim dimensions that will execute the kernel function. The total

number of global work-items is computed as global_work_size[0] * ... * global_work_size[work_dim - 1].

我理解在上图, work_dim 就是2指二维, global_work_size[0] = Gx, global_work_size[1] = Gy.
这两个值可以通过函数取得: size_t get_global_size(uint dimindx); valid values of dimindx are 0 to get_work_dim() - 1;

local_work_size points to an array of work_dim unsigned values that describe the number of

work-items that make up a work-group (also referred to as the size of the work-group) that will

execute the kernel specified by kernel. The total number of work-items in a work-group is

computed as local_work_size[0] * ... * local_work_size[work_dim - 1].

我理解在上图, work_dim 就是2指二维, local_work_size[0] = Sx, local_work_size[1] = Sy.

这两个值可以通过函数取得: size_t get_local_size(uint dimindx); valid values of dimindx are 0 to get_work_dim() - 1;

local_work_size can also be a NULL value in which case the OpenCL implementation will

determine how to be break the global work-items into appropriate work-group instances.

If local_work_size is specified, the values specified in global_work_size[0], … global_work_size[work_dim - 1]

must be evenly divisible by the corresponding values specified in local_work_size[0], … local_work_size[work_dim – 1].

这个就很容易出错了, 例如一个image是300 * 300, 我就不能设置local size为 8 * 8.

什么是Compositing (Alpha Composition) ?

"Compositing is the process by which graphical objects are combined. Alpha compositing uses the alpha values, or channel (bit mask) to represent the coverage of each pixel. The alpha channel is often said to represent the 'opacity'. This coverage information is used to control the compositing of colors. " [2]

"In computer graphics, alpha compositing is the process of combining an image with a background to create the appearance of partial transparency. It is often useful to render image elements in separate passes, and then combine the resulting multiple 2D images into a single, final image in a process called compositing."[1]

平时我看到的alpha blending好像就是所谓的simple alpha compositing, 好像就是Porter-Duff paper中的over operation.[2]: http://www.svgopen.org/2005/papers/abstractsvgopen/simplealpha.png

涉及在对两个bitmaps做blending的时候: 左边的表述是已有的底片，右边的表述是要画在上面的,

backdrop layer/source

based image blend image

target source image

framebuffer texel

background foreground

-----------------foreground * alpha

-----------------background *(1-alpha) => result.

想象一种situation，已经在framebuffer上画好了model，上面加一个painting layer，需要将这个painting layer放到上面去。

fig "".

这个图片用到公式就是很常用的，我觉得就是上面讲到的simple alpha compositing. 而且这公式的color compenent是un-premultiplied的，所以公式中加号+后面blend * blend.a.

如果是用了pre-multiplied color compenent的话， result = based * (1 - blend.a) + blend. 这就是Porter-Duffer paper中的"over" operator. 这就说明了我们平时常见的blending和Porter-Duffer compositing的关系了，并不contrary矛盾的。

Classic paper: Compositing Digital Image, 1984. http://keithp.com/~keithp/porterduff/
第2节The Alpha Channel 解释了为什么需要因为alpha channel，因为需要"retain the matte information. the extent of the coverage of an element at a pixel. An alpha of 0 indicate no coverage, 1 means full coverage, fractions corresponding to partial coverage."

里面有一个问题: "how do we express that a pixel is half covered by a full red object?

One obvious suggestion is to assign (1, 0, 0, .5) to that pixel: the .5 indicates the coverage and the (1, 0, 0) is the color.

There are a few reasons to dismiss this proposal, the most severe being that all composting operations will involve multiplying the 1 in the red channel by the 0.5 in the alpha channel to compute the red contribution of this object at this pixel. 这种un-pre-multiplied alpah的表示方式的缺点是每次做compositing时候都需要乘以alpha.

The desire to avoid this multiplication points up a better solution, storing the pre-multiplied value in the color component, so that (0.5, 0, 0, 0.5) will indicate a full red object half covering a pixel."

然后区分了两个:

black = (0, 0, 0, 1); // an opaque black, alpha = 1 means full coverage = opaque.

clear = (0, 0, 0, 0); // transparent.

什么是"clear"呢? 就是下面看到的格子背景(在gimp, photoshop中才能看到的). 我用红色的圈圈表示picker的位置. "transparent pixels cannot be displayed on a computer monitor", so a "checker board pattern" is used.

第4节讲了12种composition operations. 这部分很难懂，需要参考[2],[3],

为什么说12种呢? 组合问题。

并且假如颜色是用了pre-multiplied的话, composition后的color and alpha components都是用 equation 1: c0 = c_a * F_a + c_b * F_b来求的。那个F_a, F_b在每一种operator中的取值都不一样，F的意思是"which indicate the extend of contribution of A and B" 跟alpha不是一样的.

"It is important to note that the equations defined by the Porter and Duff paper are all defined to operate on color components that are premultiplied by their corresponding alpha components. "[5] 说Porter-Duff paper中的公式用到是color components are pre-multiplied.

先回顾alpha value的意义啊：这个color component在这个pixel上覆盖了多少, 0表示没有覆盖, 1表示全覆盖.

例如一个pixel上有picture A和picture B两个color component (element) 同时出现了，A.alpha = 1, B.alpha = 1. 这并不conflict冲突的, A的color component full coverage this pixel, and B's color component full coverage this pixel too. 当我们做alpha blending时候我们会先假设一个在某一个上面, 例如A在B上面，如果A.alpha=1，那B就看不到了. 所以我感觉alpha 是说我覆盖占这个pixel多少了, 控制的是pixel的transparency.

再来看over operator, paper p256左边最后几行的公式是:

cO = cA * Fa + cB * Fa, 然后查表(A over B)就可得到Fa and Fb的value，代入公式:

cO = cA * 1 + cB * (1 - aA) = cA + cB * (1 - aA)。

这公式是不是就是看上去很像我们上面提到的simple alpha compositing/blending的线性插值的结果呢? 其实就是嘛。paper的“This is almost the wll used linear interpolation of foreground F with the background B

B' = F * a + B * (1 - a),

except that our foreground is pre-multiplied by alpha.” 公式中的a是指F.alpha, 我觉得.

Until now, 终于将我们平时看到的linear interpolation和Porter-Duffer paper中的over operator联系起来了. 另一个证明, 在[6]中有"'normal' blend mode is equivalent to operator="over" on the 'feComposite' filter primitive, matches the blending method used by 'feMerge' and matches the simple alpha compositing technique used in SVG for all compositing outside of filter effects."

[1] Compositing Digital Image, 1984. http://keithp.com/~keithp/porterduff/

[2] http://www.svgopen.org/2005/papers/abstractsvgopen/index.html

[3] http://en.wikipedia.org/wiki/Alpha_compositing
[4] http://greenshoes.free.fr/dotclear/index.php?post/2008/12/02/Using-alpha-composites-in-PulpCore
[5] http://java.sun.com/javase/6/docs/api/java/awt/AlphaComposite.html 这里也有解释和公式.

[6] http://www.w3.org/TR/SVG/filters.html#feBlend 这是SVG1.1 20030114.

[ ] http://www.mail-archive.com/gimp-developer@lists.xcf.berkeley.edu/msg08096.html GIMP中多个layers时候也是从低往上来做composition的"the layer compositing works from the bottom to the top". 注意这种合并是非associative的，所以次序很关键.
[ ] https://mmack.wordpress.com/ 里面的Sprite mipmaps文中将到pre-mulitplied 的另一个用处.

上面讲了什么是pre-multiplied alpha，也提到了它的一个好处是不用在每次求compositing时候做乘以alpha value. 那么它还有什么好处呢? 这就是下一个话题了.

[] http://www.kxcad.net/autodesk/3ds_max/Autodesk_3ds_Max_9_Reference/premultiplied_alpha_glossary.html

[] http://blogs.msdn.com/shawnhar/archive/2009/11/06/premultiplied-alpha.aspx

[] http://blogs.msdn.com/shawnhar/archive/2010/04/09/how-shawn-learned-to-stop-worrying-and-love-premultiplied-alpha.aspx

[] http://blogs.msdn.com/shawnhar/archive/2009/11/11/premultiplied-alpha-content-processor.aspx

[] http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Premultiplied%20alpha]]

当然了, 它也有缺点的. "To be clear: premultiplied alpha is not a perfect panacea for all problems".

[] http://kriscg.blogspot.com/2009/11/premultiplied-alpha.html
[] http://msdnrss.thecoderblogs.com/2010/04/08/premultiplied-alpha-in-xna-game-studio-40/

讲完了 what is premultiplied alpha and why it solves all the world's problems. 下面讲一下它跟blend mode的关系，其实我是因为要做blend mode的操作，而1. blend mode的某些操作中用到了over operator等；2. the formulas in that photoshop document are assuming the pixels are unpremultiplied. 3. Mudbox uses premultiplied alpha, so the math will be different in some cases.

才涉及到Porter-Duff Compositing的. 下面回归blend mode.

首先, "While every pixel has transparency information associated with it, every layer also has an associated opacity value. The two terms are similar and in most cases can be treated as the same. You may think of a layer's opacity value as a "dimmer" for the alpha values of every pixel in the layer."http://www.getpaint.net/doc/latest/en/LayersAndBlendModes.html

下图说明Opacity是layer的属性, alpha是pixel的属性. 合成时候貌似是:

先f' = foreground image * opacity，再来用这个f'去继续的.

svg的specifications都是用pre-multiplied的.
在svg 1.1 中[http://www.w3.org/TR/SVG11/filters.html#feBlend, http://www.w3.org/TR/2003/REC-SVG11-20030114/filters.html#feBlend 这两个是一样的]中可以找到normal, multiply, screen, darken, lighten的公式，都是premultiplied的, 相关的实现在google code search中找nr-filter-blend.cpp 可以找到。是不是意味着在svg 1.1中只是支持这5个Porter-Duff操作呢? 后面提到1.2版本做了扩展的.

在svg filter 1.2 [http://www.w3.org/TR/2007/WD-SVGFilter12-20070501/#feBlendElement] 中也一样，
但是在svg 1.2[http://www.w3.org/TR/2004/WD-SVG12-20041027/rendering.html]就好像对alpha compositing做了扩展，加入了Compositing module.

[http://www.svgopen.org/2005/papers/abstractsvgopen/]里面有一些介绍和解释。

这里给我的感觉就是后续的出现在PDF specification中的其它例如darken/lighten, soft light等都是Porter-Duffer 之后的扩展, 如上说的Normal mode就是Over operator.

上面那是svg的specifications, 至于具体在一个library中怎么实现，那还是有区别的.

下面看看Cairo的实现(它利用了一个叫做Pixman: the pixel-manipulation library for X and cairo的库).

首先, 貌似是Benjamin Otte这家伙(后来在red hat中上班)在Cairo中加入blend mode的(March 2008, http://old.nabble.com/Blend-modes-td15856879.html ), 然后他发现了SVG标准中的soft-light mode跟PDF中描述的algorithm不一样，所以在October 2008就给SVG提了个建议:

下面就是Benjamin提出问题的地方.

http://lists.w3.org/Archives/Public/www-svg/2008Oct/0029.html Benjamin在这mailing list里提了问题 (然后被SVG里面的人 raised as issue 2095). 顺着这个mailing list下去会看到有人会回应的, 在http://lists.w3.org/Archives/Public/www-svg/2008Oct/ 页面中搜索blend 或者 soft关键字，一直到2009年Feb 12号的http://lists.w3.org/Archives/Public/public-svg-wg/2009JanMar/0132.html.

[之后被raised 作为issue 2095], 后面有人跟进:
http://markmail.org/message/f4tsfvm2wjevxyh6#query:+page:1+mid:vuxzgn53rls7aq3o+state:results 这里面Alex Danilo的回答"while I was working on this, we had to deduce the functions from various sources, since the equations were not published. The starting point was some Japanese site that had reverse engineered the equations from looking at Photoshop. We extended the base equations to include alpha correctly." 可见SVG初始的实现(2002年)是对Photoshop做逆向工程得到的, 而且考虑了pre-multiplied alpha, 然后Alex他还给出了soft-light的公式(可能是逆向工程得到的, 这公式就是上面提到的svg 1.2 http://www.w3.org/TR/2004/WD-SVG12-20041027/rendering.html中所用的公式). 之后Benjamin又给出了他根据PDF specification推导的公式(这公式就是他在pixman-combine32.c代码中所用的), 后来Anthony Grasso 在2009 Feb 12又给出了他根据PDF specification推导的公式, 讨论时候还提到的 Color Dodge and Color Burn的公式，Benjamin和Anthony对这两个公式是没有异议的.

他们SVG Working Group 后来还在Teleconference中提到了这个问题, 在http://www.w3.org/2009/02/16-svg-minutes.html 页面中找soft-light关键字.

他们比较完整的email整理在http://www.w3.org/Graphics/SVG/WG/track/issues/2095

上面的公式都是从un-pre-multiplied格式中他们自己推导过去的，原始的公式和参考文档Benjamin Otte在:

http://lists.freedesktop.org/archives/cairo/2008-October/015362.html 里面提到Adobe PDF specification分别是[http://www.adobe.com/devnet/acrobat/pdfs/PDF32000_2008.pdf chapter 11] and [http://www.adobe.com/devnet/pdf/pdfs/blend_modes.pdf] 估计Photoshop也是类似的算法.

GIMP, 18 Oct 2008也对Benjamin的实现有点兴趣，于是有了http://www.mail-archive.com/xorg@lists.freedesktop.org/msg01141.html，他们也支持svg 1.2中使用的公式有点旧了, 而PDF specification是2008年出来，更新一点. 而且他们讨论了觉得大部分的formula都是对的，那两个来自Flash的Invent和谁是PDF spec里面没有的.

至于推导公式的细节，就找不到了. 自己做一下试试.
首先，在PDF32000_2008(所谓的PDF Spec)P322中给出了basic (colour) compositing formula:

然后在P328中说"The preceding formula represents a simplification of the following formula, which presents the relative contributions of backdrop, source, and blended colours in a more straightforward way":

这公式其实更好，为什么呢？

我暂时觉得有两个原因.

1. 例如我们已经画好了backdrop，然后在上面多加一个source(以layer形式出现), 并且这个source中不是整个图片都是有painting结果的(例如这个layer是用stroke做painting画出来的), 在stroking不到的部位就为clear(0,0,0,0)全透明full transparency. 看下面我在GIMP中例子:

图中有two layers, 上面一个作为source layer它的大部分区域都是全透明的，alpha=0, 注意它的color component=(0,0,0)是黑色，假如我们做blending时候用到了这个黑色的话，那效果就不好了，因为本来它是全透明的，底线的background应该不受影响的.

那么上面的p328的公式就比p322的公式更准确了.

2. p328的公式前半部分更容易转化为pre-multiplied alpha的形式. 看下图:

其实在pixman-combine32.c代码中， Benjamin做实现时候大部分的blend mode就是用了pre-multiplied format的blending mode function. 然后Color, Luminosity, Saturation, Hue这四种blend mode是例外, 还是照着PDF一样用non-premultiplied colors. 前者和否则分别对应他代码中的PDF_SEPARABLE_BLEND_MODE和PDF_NON_SEPARABLE_BLEND_MODE。

我需要猜懂他为什么得到这样的结果，并且在上面也说了对于soft light 也要格外小心 “Anthony Grasso 在2009 Feb 12又给出了他根据PDF specification推导的公式”。
有一点其实挺奇怪的是，上面的pre-multiplied format公式中前面的一部分就是Porter-Duff paper中的XOR operator, 应该不是巧合coincidence吧，是什么原因呢?

pic, pic,

上面列举了几个简单的blend mode 从non-premultiplied 到 premultiplied的转化，下面看看复杂一点的例子soft light blend mode.

// B is the backdrop image, S is the source layer image.
float3 Bca = backdrop.rgb; float Ba = backdrop.a;
float3 Sca = source.rgb; float Sa = source.a;
float3 tmp = Bca * (1 - Sa) + Sca * (1 - Ba);

if (Ba < 0.001 || Sa < 0.001) {
// when either Ba and Sa is zero, Ba * Sa * Blend(Bc, Sc) = 0.
// so the color is XOR of Bca and Sca, which is the current tmp value.
} else {
if (2 * Sca.r <= Sa && 2 * Sca.g <= Sa && 2 * Sca.b <= Sa) {
tmp += Bca * Sa - (Sa - 2*Sca) * Bca * (1 - Bca/Ba);
} else {
float3 D;
if (4 * Bca.r <= Ba && 4 * Bca.g <= Ba && 4 * Bca.b <= Ba) {
D = ((16*Bca/Ba - 12) * Bca/Ba + 4) * Bca;
} else {
D = sqrt(Bca * Ba);
}
tmp += Bca * Sa + (2 * Sca - Sa) * (D - Bca);
}
}
float alpha = UnionOp(Ba, Sa); //Õâ¶«Î÷ºóÃæ»á½²µ½.
float4 result = float4(tmp, alpha)

再回去看看那个XOR operator是怎么回事，下面看看假如Rca = XOR(Bca, Ba, Sca, Sa)的效果:

这样就很直观了，感觉好像就是Layer 全透明(full transparency, clear(0,0,0,0))的地方不影响背景, 公式中解释为Sa = 0, 所以(1 - Sa) * Bca + (1 - Ba) * Sa= (1 - 0) * Bca + (1 - Ba) * (0,0,0) = Bca; 而Layer中非透明的地方，就不好直观地说了，要代入公式来算了。这本例中因为非透明的地方Ba = Sa = 1, 所以结果是(0,0,0)黑色.

上面解决了怎么从basic (colour) compositing formula中求出colour, 在PDF spec chapter 11.3.7.3中讲了result alpha的求法:

Ra = union(Sa, Ba) = Ba + Sa - (Ba * Sa).

这公式很像求color时候用的scree blend mode. “The result tends toward 1.0: if either input is 1.0, the result is 1.0”.

于是最后的结果应该就是: (Rca, Ra), in premultiplied format.

扩展:

Quartz 2D是 mac/iphone上的graphics render engine, 不开源的,

http://developer.apple.com/mac/library/documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/dq_paths/dq_paths.html 页面中找"blend mode" 可看到介绍.

Cairo 是跨平台的, 应用广泛，例如在Mozilla Firefox, GTK/Gnome, OpenOffice中用到了,
http://cairographics.org/operators 这是blend mode的例子.

SKia图形引擎, 在Google Chrome/Android中用到了.

Canj's trail

2010年5月30日星期日

Bug Series - crash at delete pointer

2010年5月16日星期日

Gaussian Blur

2010年5月15日星期六

How to locate the work item in OpenCL?

2010年5月8日星期六

premultiplied alpha