1

Can someone explain me a special PDF matrix transformation? I just can't figure it out. I have the following content stream of a PDF:

q
q
1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 cm
q
1 0 0 -1 0 841.8898 cm
1.0000 0 0 1.0000 0 0 cm
q
141.84 0 0 70.80 376.68 63.60 cm
/IM1 Do
Q
BT
/F1 15.96 Tf
-0.00 Tc
1. 0. 0. -1. 0. -0. Tm
56.64 -86.4 Td
(Rechnung) Tj
ET

Let's ignore the image that is placed at the beginning.

According to my calculations based on the Acrobat PDF Specification 1.7, the text "Rechnung" is rendered at the following user space position:

point (userspace) = (56.64, 928.2898);

The matrices before the Tj operator are as follows:

ctm = [1.0, 0.0, 0.0, -1.0, 0.0, 841.8898]
tm = [1.0, 0.0, 0.0, -1.0, 56.64, -86.4]
--> usm = tm * ctm = [1.0, 0.0, 0.0, 1.0, 56.64, 928.2898] // user space matrix

However, the user space point lies outside the A4 area, namely at y = 928, which would fall outside the visible A4 area. Yet Acrobat still renders it on the A4 page.

I’m currently building a parser and want to determine exactly which text appears in a given region (e.g. [x=56.0, y=587.0, w=241.0, h=128.0] = DIN 5008 address window). I might want to reposition such elements.

With “normal” CTMs and TMs, everything works fine. But in this case, I can’t make sense of the rendering. Acrobat must be applying some kind of additional correction/conversion so that the y = 928 position ends up fitting on the A4 page, specifically at y = 754. Does anybody know what exactly is being done here?

For this special case, I could calculate:

y' = 841 - (y - 841) = 2 * 841 - y
y' = 754

(So the word “Rechnung” lies outside the address region.)

Or in general:

y' = 2 * mediaBox.height - y

That would be the position in the normal PDF coordinate system. I’d like to apply this method correctly for arbitrary CTMs. And I believe the correction above only needs to be applied in specific CTM configurations where the coordinate system was transferred before.

1
  • iPDFdev in their answer took the handwaving approach to explain and I in my answer followed the spec and did the plain calculation. As you don't mention how you calculated the Tm before Tj, I cannot exactly pinpoint why you got the wrong result. Commented Jun 5 at 8:07

2 Answers 2

3

The matrices before the Tj operator are as follows:

ctm = [1.0, 0.0, 0.0, -1.0, 0.0, 841.8898]
tm = [1.0, 0.0, 0.0, -1.0, 56.64, -86.4]
--> usm = tm * ctm = [1.0, 0.0, 0.0, 1.0, 56.64, 928.2898] // user space matrix

You made a mistake calculating Tm.

Applying tx ty Td is specified as

this operator shall perform these assignments:

Specification Td

(ISO 32000-2 Table 106 — Text-positioning operators)

In your case:

Td in the case at hand

I.e. the last entry in your tm is not negative.

The final matrix, therefore, is

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

sorry, messed tmctm with cmtmt. Thanks!
2

Assuming the page's MediaBox is [0 0 595 842] (A4), the origin of the coordinate system is located in bottom left corner of the page, x -> left to right, y -> bottom to top
1 0 0 -1 0 841.8898 cm - moves the origin of the coordinate system to top left corner of the page and reverses the direction of Y axis, y -> top to bottom
1.0000 0 0 1.0000 0 0 cm - has no effect
141.84 0 0 70.80 376.68 63.60 cm - the transformation matrix for the image, being enclosed in q/Q has no effect on text position
1. 0. 0. -1. 0. -0. Tm - reverses again the direction of Y axis for text operations, origin of coordinate system still located in top left corner of the page but y -> bottom to top. Now positive y values will be outside visible page area, negative y values will be inside the page visible area.
56.64 -86.4 Td - text is below the page's top margin in the page visible area.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.