Can someone explain me a special PDF matrix transformation? I just can't figure it out. I have the following content stream of a PDF:
q
q
1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 cm
q
1 0 0 -1 0 841.8898 cm
1.0000 0 0 1.0000 0 0 cm
q
141.84 0 0 70.80 376.68 63.60 cm
/IM1 Do
Q
BT
/F1 15.96 Tf
-0.00 Tc
1. 0. 0. -1. 0. -0. Tm
56.64 -86.4 Td
(Rechnung) Tj
ET
Let's ignore the image that is placed at the beginning.
According to my calculations based on the Acrobat PDF Specification 1.7, the text "Rechnung" is rendered at the following user space position:
point (userspace) = (56.64, 928.2898);
The matrices before the Tj operator are as follows:
ctm = [1.0, 0.0, 0.0, -1.0, 0.0, 841.8898]
tm = [1.0, 0.0, 0.0, -1.0, 56.64, -86.4]
--> usm = tm * ctm = [1.0, 0.0, 0.0, 1.0, 56.64, 928.2898] // user space matrix
However, the user space point lies outside the A4 area, namely at y = 928, which would fall outside the visible A4 area. Yet Acrobat still renders it on the A4 page.
I’m currently building a parser and want to determine exactly which text appears in a given region (e.g. [x=56.0, y=587.0, w=241.0, h=128.0] = DIN 5008 address window). I might want to reposition such elements.
With “normal” CTMs and TMs, everything works fine. But in this case, I can’t make sense of the rendering. Acrobat must be applying some kind of additional correction/conversion so that the y = 928 position ends up fitting on the A4 page, specifically at y = 754. Does anybody know what exactly is being done here?
For this special case, I could calculate:
y' = 841 - (y - 841) = 2 * 841 - y
y' = 754
(So the word “Rechnung” lies outside the address region.)
Or in general:
y' = 2 * mediaBox.height - y
That would be the position in the normal PDF coordinate system. I’d like to apply this method correctly for arbitrary CTMs. And I believe the correction above only needs to be applied in specific CTM configurations where the coordinate system was transferred before.


