How to convert paths in pdf file to text

Closed. This question is seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. It does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.

Closed 3 days ago.

Improve this question

pdf files does not contains any image, nor text. The text visible on the page is rendered as Path.

Tried PdfPig (https://github.com/UglyToad/PdfPig) using

using PdfDocument document = PdfDocument.Open(stream, SkiaRenderingParsingOptions.Instance );
string ptxt = "";
foreach(PdfPath p in page.Paths) 
  ptxt += p.ToString();
Console.WriteLine(ptxt);

Output is

UglyToad.PdfPig.Graphics.PdfPath

How to convert such pdf files to plain text ? If direct conversion is not possible, how to convert pdf to image to pass to OCR ?

PDFs may also contain text objects from which text can extracted directly.

PdfPig exposes Paths collection which can used to retrieve every Path objekt. How to convert each Path object to image? PDF Viewer source code should contain this.

How to use OpenCv or SkiaSharp for this conversion?

This is .NET 9 ASP.NET MVC application.

edited Nov 11 at 20:00

asked Nov 10 at 21:46

Andrus

28.2k67 gold badges218 silver badges397 bronze badges

1

You need to find a library that does PDF to image conversion (recommendations are now allowed here, you have to ask the questions on softwarerecs.stackexchange.com) and then perform OCR on that image.

i.PDF.dev
– i.PDF.dev

2025-11-11 08:03:24 +00:00
Commented Nov 11 at 8:03
@KJ MVC controller should convert such pdfis without human. How to implement this? How to convert pdf paths to image to pass image to OCR or is there other automated solution?

Andrus
– Andrus

2025-11-11 08:04:30 +00:00
Commented Nov 11 at 8:04
@iPDFdev Posted question in softwarerecs.stackexchange.com/questions/94890/…

Andrus
– Andrus

2025-11-11 08:10:55 +00:00
Commented Nov 11 at 8:10
@iPDFdev Is it possible to read Path from pdf files and use OpenCv or SkiaSharp to create image from it?

Andrus
– Andrus

2025-11-11 08:25:05 +00:00
Commented Nov 11 at 8:25
1

For very simple cases you can extract the paths and draw them using SkiaSharp (or any other graphic engine). But for general PDF use you would have to implement a full PDF renderer and simple path extraction would not be enough.

i.PDF.dev
– i.PDF.dev

2025-11-12 08:06:14 +00:00
Commented Nov 12 at 8:06

| Show 1 more comment

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to convert paths in pdf file to text [closed]

0

Hot Network Questions