The problem with using that preprocessor specifically is that shell scripts use # as a comment character. So you cannot use it to preprocess shell scripts which have comments.
That may be fine if you you can stick to a coding convention that your preprocessed shell scripts use only /* ... */ and // comments, and the output has no comments.
A comment that cannot be removed is the hash-bang line. If you pass code with the hash bang line to the GNU C preprocessor, it will complain about an invalid preprocessing directive.
A solution to these problems may be to adopt a convention such as the following:
cpp -E ... -DHASH='#'
That is to say, assume there is a predefined macro called HASH which expands to the hash mark. Then in the script you can do:
HASH!/bin/sh
and also encode comments like this:
HASH This is a comment
Unfortunately, this doesn't quite work, because cpp inserts whitespace at the start of the line. I get the output:
#!/bin/sh
^ space here, oops!
# this is a comment
So that has to be addressed. Another problem is that there is extra verbiage in the output like this:
# 1 "prepro.sh.in"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "prepro.sh.in"
That has to be cleaned out. Here is something that points toward a viable solution. My input prepro.sh.in input file is this:
HASH!/bin/sh
HASH this is a comment
foo()
{
}
The command I'm running is this:
cpp -E -DHASH='#' prepro.sh.in | sed -e '/^#/d;s/^ #/#/'
The output:
#!/bin/sh
# this is a comment
foo()
{
}
There may me other stumbling blocks. The C preprocessor is defined not as a textual filter but as a processing step which identifies and generates "preprocessing tokens". Even the lines which are not preprocessing directives are being tokenized.
I would be concerned about some instances of significant whitespace not being correctly preserved.
The man page for GNU cpp has some admonishing words:
The C preprocessor is intended to be used only with C, C++, and
Objective-C source code. In the past, it has been abused as a general
text processor. It will choke on input which does not obey C's lexical
rules. For example, apostrophes will be interpreted as the beginning
of character constants, and cause errors. Also, you cannot rely on it
preserving characteristics of the input which are not significant to
C-family languages. If a Makefile is preprocessed, all the hard tabs
will be removed, and the Makefile will not work.
Having said that, you can often get away with using cpp on things which
are not C. Other Algol-ish programming languages are often safe
(Pascal, Ada, etc.) So is assembly, with caution. -traditional-cpp
mode preserves more white space, and is otherwise more permissive.
Many of the problems can be avoided by writing C or C++ style comments
instead of native language comments, and keeping macros simple.
If you're going to do this anyway, it's probably a good idea to follow the useful recommendation to use the -traditional-cpp option.
testScript.c, because it definitely isn't C source code.