5

I am using pdfkit to dynamically generate PDF documents within a nodewebkit application. The PDFs contain people's comments coming from a remote source via an HTTP request.

It works really well, however now I spotted that when a comment is in Japanese, Chinese, Arabic, etc. it doesn't render correctly, and I have no means of knowing what language the comments will be coming in—in fact I am gathering them from around the world.

I understood that I need to use the right font that should have proper characters included, as explained here. I spotted this "google noto" open font which has it all, but the problem is that there is no single TTF file with all languages, and there can't be as font files are limited to 65K glyphs.

I am trying to find a solution that lets render text in (almost) any language within a PDF using pdfkit, without having to write a sophisticated language recognition tool, which I feel would be an overkill.

Any thoughts and suggestions will be much appreciated.

UPDATE: Use font-manager by the author of pdfkit to substitute the font. Also you may want to try phantomJS—I haven't done that though. See detailed response by @levi in the comments if you have the same problem. Hope it helps.

2
  • What about using a 3rd party language detection API? stackoverflow.com/questions/7025915/… Commented Dec 25, 2014 at 20:13
  • @levi Yep, that's possible, but what if a comment has both English and Chinese, for instance. I thought about this too. Commented Dec 26, 2014 at 17:49

1 Answer 1

1

Here is one idea. Download all the fonts for the most popular languages. Add them to a list, and sort it by most popular. Foreach comment, get the unicode values for n random character's within the string. Foreach character, if code > 127 (ASCII range) comment may not be English. Using opentype.js, parse the font files one by one, foreach font, check the cmap table if there exists glyph's for all the character codes sampled. If there does, then choose that font, and cache a mapping between unicode code to font. Otherwise, try next font.

Upon further consideration, it seems TTF files provide info on the unicode ranges they support via the UnicodeRange field. So perhaps you could build a mapping between each font and the unicode ranges it supports, and use this to select the correct font, instead of parsing each font at run-time.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks much! That might be a way to go, however I was hoping for a simpler solution as I felt like all of this might be an overkill to just have a simple PDF page with the comments. My other thought was using some font substitution that HTML/CSS has and form PDF based on that... But that means I have to replace pdfkit as it doesn't support it to my knowledge.
It does appear the browser (or OS?) has the logic to fallback to necessary fonts on non-english text. For example, google chinese unicode map, you will see in dev-tools under computed styles, the browser uses an appropriate font, automatically. If you are ok with pdf text being non-selectable, you can use html2canavas to render text to image (using browser capabilities), and embed the image in PDF.
pdfkit relies on the PDF reader to render the unicode text using one of the PDF standard fonts, unless you embed a custom font, in which case the reader uses that. As you can see, pdfkit does not utilize the browsers text rendering capabilities.
Yes, thank you @levi! html2canvas sounds like another great idea. I have figured out that there is font-manager developed by the author of nodewebkit that may find a substitute font if you pass in a Japanese or Chinese screen, located directly on user's computer. Then, you can use that font with pdfkit. The author is planning to utilize font-manager in pdfkit later on. That thing worked for me.
So, I ended up using regular Google Noto font for everything, and a free CJK TTF font for the respective characters. I have also found out that phantomJS with its headless rendering engine may help with that, and will see if I want to switch to that solution later. Thanks for your help @levi.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.