2

I want to write a shell script that will read a file from standard input, remove all string and empty line character, and write the output to the standard output. the file look like this:

#some lines that do not contain <html> in here
<html>a<html>
<tr><html>b</html></tr>
#some lines that do not contain <html> in here
<html>c</html>

So, the output file should contain:

#some lines that do not contain <html> in here
a
<tr>b</html></tr>
#some lines that do not contain <html> in here
c</html>

I try to write this shell script:

read INPUT #read file from std input
tr -d '[:blank:]'
grep "<html>" | sed -r 's/<html>//g'
echo $INPUT

however this script isn't working at all. any idea? thx

7
  • You might want to try this in Perl (or something other than a certain shell,) if possible: check out the answer(s) on this other question Commented Mar 19, 2013 at 19:50
  • @summea I can't. I have to use #!/usr/bin/bash Commented Mar 19, 2013 at 19:52
  • should the comments be preserved? Commented Mar 19, 2013 at 19:52
  • I guess I don't understand why you have multiple <html></html> pairs in one document, as well... Commented Mar 19, 2013 at 19:56
  • I don't know it either. it just some random file that my teacher give to us Commented Mar 19, 2013 at 19:59

2 Answers 2

1

Pure bash:

#!/bin/bash

while read line
do
    #ignore comments
    [[ "$line" = "\#" ]] && continue
    #ignore empty lines
    [[ $line =~ ^$ ]] && continue
    echo ${line//\<html\>/}
done < $1

Output:

$ ./replace.sh input
#some lines that do not contain in here
a
<tr>b</html></tr>
#some lines that do not contain in here
c</html>

Pure sed:

sed -e :a -e '/^[^#]/N; s/<html>//; ta' input | sed '/^$/d'
Sign up to request clarification or add additional context in comments.

4 Comments

what [[ "$line" = "\#" ]] mean? and I can't only use grep and sed
see comments in source above
so the first sed will remove <html>, but what the second sed do?
second one deletes empty lines
1

Awk can do it easily:

awk '/./ {gsub("<html>","");print}' INPUTFILE

First it operates on every line with at least one character (so empty lines are discarded), and it replaces "<html>" globally with an empty string on the line, then prints it.

3 Comments

OP needs comments to be preserved
I can only use grep and sed. but what is /./ mean? is it mean the current directory?
@HannaGabby - /./ is a regular expression that means one character [any]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.