6

I have a html menu file, which contains list of html pages, extracted by chm decoder.

(7,0,"Icons Used in This Book","final/pref04.html");
(8,0,"Command Syntax Conventions","final/pref05.html");
(9,0,"Introduction","final/pref06.html");
(10,0,"Part I: Introduction and Overview of Service","final/part01.html");
(11,10,"Chapter 1. Overview","final/ch01.html");
(12,11,"Technology Motivation","final/ch01lev1sec1.html");

I want create from this a 'table of contents' file for Calibre (HTML file that contains links to all the other files in the desired order). The final file should look like this:

<a href="final/pref04.html">Icons Used in This Book</a><br/>
<a href="final/pref05.html">Command Syntax Conventions</a><br/>
.
.
.

So first I need to remove the digit prefixes with regular expression, then add a href attribute to make hyperlink, and change the URL and title position. Can anyone show how to make this with Notepad++?

1 Answer 1

5

I think this would do it for you, I'm mac based so I don't have notepad++ but this works in dreamweaver. Presuming each expression is one line based.

Find:

\(.*?"(.*?)","(.*?)".*

Replace:

<a href="$2">$1</a><br/>

File:

(7,0,"Icons Used in This Book","final/pref04.html");
(8,0,"Command Syntax Conventions","final/pref05.html");
(9,0,"Introduction","final/pref06.html");
(10,0,"Part I: Introduction and Overview of Service","final/part01.html");
(11,10,"Chapter 1. Overview","final/ch01.html");
(12,11,"Technology Motivation","final/ch01lev1sec1.html");

After Replace All:

<a href="final/pref04.html">Icons Used in This Book</a><br/>
<a href="final/pref05.html">Command Syntax Conventions</a><br/>
<a href="final/pref06.html">Introduction</a><br/>
<a href="final/part01.html">Part I: Introduction and Overview of Service</a><br/>
<a href="final/ch01.html">Chapter 1. Overview</a><br/>
<a href="final/ch01lev1sec1.html">Technology Motivation</a><br/>

If it isn't one line based change .* to .*?\n. That should make it stop after each newline. For readability you also may want to add a newline to the replace.

Should probably explain the regex as well in case you want to modify it...

The first \ is escaping the ( so the regex knows to look for the literal character and the not special regex grouping. The *? says find every character until the first "; (. is any single character, * is zero or more occurrences of the preceding character, and ? tells it to stop at the first occurrence of the next character, "). The last .* says keep going with the search. The ( and ) around the .*? group the found value into the $1 and $2. The number correlates to the order in which it is in the regex.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.