if I want to count the lines of code, the trivial thing is
cat *.c *.h | wc -l
But what if I have several subdirectories?
You should probably use SLOCCount or cloc for this, they're designed specifically for counting lines of source code in a project, regardless of directory structure etc.; either
sloccount .
or
cloc .
will produce a report on all the source code starting from the current directory.
If you want to use find and wc, GNU wc has a nice --files0-from option:
find . -name '*.[ch]' -print0 | wc --files0-from=- -l
(Thanks to SnakeDoc for the cloc suggestion!)
sloccount /tmp/stackexchange (created again on May 17 after my most recent reboot) says that the estimated cost to develop the sh, perl, awk, etc files it found is $11,029. and that doesn't include the one-liners that never made it into a script file.
As the wc command can take multiple arguments, you can just pass all the filenames to wc using the + argument of the -exec action of GNU find:
find . -type f -name '*.[ch]' -exec wc -l {} +
Alternately, in bash, using the shell option globstar to traverse the directories recursively:
shopt -s globstar
wc -l **/*.[ch]
Other shells traverse recursively by default (e.g. zsh) or have similar option like globstar, well, at least most ones.
If you are in an environment where you don't have access to cloc etc I'd suggest
find -name '*.[ch]' -type f -exec cat '{}' + | grep -c '[^[:space:]]'
Run-through: find searches recursively for all the regular files whose name ends in either .c or .h and runs cat on them. The output is piped through grep to count all the non-blank lines (the ones that contain at least one non-spacing character).
You can use find together with xargs and wc:
find . -type f -name '*.h' -o -name '*.c' | xargs wc -l
total lines if several wcs are being invoked.)
wc commands problem can be addressed by piping find to while read FILENAME; do . . .done structure. And inside the while loop use wc -l. The rest is summing up the total lines into a variable and displaying it.
As has been pointed out in the comments, cat file | wc -l is not equivalent to wc -l file because the former prints only a number whereas the latter prints a number and the filename. Likewise cat * | wc -l will print just a number, whereas wc -l * will print a line of information for each file.
In the spirit of simplicity, let's revisit the question actually asked:
if I want to count the lines of code, the trivial thing is
cat *.c *.h | wc -lBut what if I have several subdirectories?
Firstly, you can simplify even your trivial command to:
cat *.[ch] | wc -l
And finally, the many-subdirectory equivalent is:
find . -name '*.[ch]' -exec cat {} + | wc -l
This could perhaps be improved in many ways, such as restricting the matched files to regular files only (not directories) by adding -type f—but the given find command is the exact recursive equivalent of cat *.[ch].
Sample using awk:
find . -name '*.[ch]' -exec wc -l {} \; |
awk '{SUM+=$1}; END { print "Total number of lines: " SUM }'
+ in place of \;.
wc -l for groups of files, rather like xargs does, but it handles odd-ball characters (like spaces) in file names without needing either xargs or the (non-standard) -print0 and -0 options to find and xargs respectively. It's a minor optimization. The downside would be that each invocation of wc would output a total line count at the end when given multiple files — the awk script would have deal with that. So, it's not a slam-dunk, but very often, using + in place of \; with find is a good idea.
wc. If unknown a priori the number of files that will be found, is there the risk to pass that limit or somehow is it handled by find?
find groups the files into convenient size bundles, which won't exceed the length limit for the argument list on the platform, allowing for the environment (which comes out of the argument list length — so the length of the argument list plus the length of the environment has to be less than a maximum value). IOW, find does the job right, like xargs does the job right.
easy command:
find . -name '*.[ch]' | xargs wc -l
total lines if several wcs are being invoked.)
If you're on Linux I recommend my own tool, polyglot. It's dramatically faster than cloc and more featureful than sloccount.
You should be able to build on BSD as well, though there aren't any provided binaries.
You can invoke it with
poly .
The new bid on the cloc is Loci.
Link to NPM package
It counts code similarly to cloc, but is faster at scale.
Also, as its natively written in nodejs it will run on all other environments without Perl (for cloc.pl) or cloc.exe.
It is in its infancy, but you can install it as an NPM CLI tool, or import it as a library into your own project.
Great for environments where you can install script-based npms, but are not allowed to use unapproved binaries
find . -name \*.[ch] -print | xargs -n 1 wc -l should do the trick. There are several possible variations on that as well, such as using -exec instead of piping the output to wc.
find . -name \*.[ch] -print doesn't print the contents of the files, only the file names. So I count the number of files instead don't I? Do I need `xargs' ?
xargs, and you'd also need to watch for multiple wc invocations if you have lots of files; you'd need to look for all the total lines and sum them.
find . -name \*.[ch] -print0 | xargs -0 cat | wc -l
find . -name \*.[ch] -print | wc -l) counts the number of files (unless a file name contains a newline — but that's very unusual) — it does not count the number of lines in the files.
cat?wc -l *.c *.hdoes the same thing.wc -l *.c *.h | tail -n 1to get similar output.**, so you could have usedwc -l **/*.{h,c}or something similar. Note that in Bash, at least, this option (calledglobstar) is off by default. But also note that in this particular case,clocorSLOCCountis a much better option. (Also,ackmay be preferable tofindfor easily finding/listing source files.)