1

I need to delete/remove comments from a user-input line without deleting any codes. So for example:

mail -s 'text' brown < text #comments

How do I remove the comments and leave the code intact? I can delete lines that begin with #, but not if it begins somewhere in the middle of the lines.

I tried:

echo $line | sed -e 's/\

but it does not work. Any idea what I'm doing wrong?

Also, how to detect cases in which # is not used to begin a comment? For example quoted # and line of code that ends with # since they are not comments.

echo $line | sed -e  '/^#/d'

In this line, the # is not used as a comment, but as part of code. I figure out that I need to detect that if # is within quotes or does not have a whitespace character before the #. How do I leave the output as it is?

2
  • I would like to know how you think you can remove the comment from my example file in my post automatically. Commented May 12, 2014 at 5:27
  • Doing this job properly is impossible unless you take into account the lexical structure of a shell script. You need to handle single quoted strings, double quoted strings, variable expansions such as $# and ${#variable} and ${variable#head}, and here documents (as a start). When you can detect those accurately, in all their glory (remember, quoted strings can extend over multiple lines!), then you can start to detect comments. Note that echo a#b echoes three characters plus a newline. Commented May 12, 2014 at 5:48

2 Answers 2

3

You can remove all from # to end of line using this awk

awk '{sub(/#.*$/,"")}1' file

But if you have file like this:

#!/bin/bash
pidof tail #See if tail is running
if [ $? -ne 0 ] ; then  #start loop
   awk '{print " # "$8}' file >tmp # this is my code
fi # end of loop
awk -F# '{for (i=1;i<=NF;i++) print $i}' file > tmp2
a=a+1 # increment a

There are no way you can remove the comment automatically without destroying some.

Sign up to request clarification or add additional context in comments.

5 Comments

Yes, but the problem is it also deletes anything that is not part of comment. So for example: code/#/code . Since the # is part of a code, I don't want it to delete anything from that line.
@chipunpui Can you then tell me how a program should see the difference on # used as a comment, and not as some part of a code. You then need some logic to understand that the text is human information about the code.
Note that this destroys any use of $# or ${#variable} or various other constructions that use a #, not to mention quoted strings containing a #.
@JonathanLeffler that was some of my point. There are no way to remove comments automatically. Some human has to see if it comment or part of the code.
It can be done, but it requires a full-scale parser for the shell language (or, at least, a lexical scanner that is cognizant of all the syntax rules for the shell language). It can't reasonably be done using sed; it would be extremely messy using awk. In fact, it would be fairly messy regardless of the language used to implement the parser/scanner.
0

Well, consider what almost always comes after a comment in bash.

#comment...
#another comment

A line break! Which is effectively a character. So, all you have to do is add a wildcard after your #, to include the actual comment text, then put a line break 'character' at the end. You'll actually need to use \n rather than trying to hit Enter. Unfortunately I'm not on linux at the moment, and sometimes delimiters (the backslash) don't work properly. Trying something like `\n` might work, or maybe using $'\n'.

EDIT: With regex ^ will indicate the start of a new line, while $ indicates the end.

As for not deleting actual code, matching for a space immediately followed by # should work. I would match for a space OR line break preceding the #.

At any rate, please be sure not to accidentally ruin whatever you're working on, just in case I'm wrong.

1 Comment

No, that's exactly what I am asking. I just don't know how since I'm still learning about sed/awk. I initially thought to parse the string by character and check whether the character preceding # is a space or a line break, but that is a lot of work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.