0

I have an HTML file where I need to replace text inside the paragraph element (<p>) with the capital letters of the same like <p>hi</p> to <p>HI</p>.

x=`cat $1 | grep -o '<p>.*</p>' | tr '[:lower:]' '[:upper:]'`
var2=`echo $x`
headerremove=`grep -o '<p>.*</p>' $1`
var3=`echo $headerremove`
echo $var2
echo $var3
sed 's/$var3/$var2/g' "$1"

Input
<h1>head</h1>
<p>hello</p>

Output
<p>HELLO</p>

This is not working as expected. Also I need to remove all other details like all tags and their child elements other than the paragraph element.

6
  • 1
    post your input html contents and expected result Commented Apr 25, 2018 at 21:44
  • unix.stackexchange.com/users/148009/romanperekhrest Please see the edited question Commented Apr 25, 2018 at 21:49
  • can you post the actual html document? Commented Apr 25, 2018 at 22:06
  • Actually I made a random html file with this content, basically the requirement is to capitalize content inside <p> tag and remove all other tags and contents Commented Apr 25, 2018 at 22:13
  • (1) What do you think you’re accomplishing by saying var2=`$x`? (Hint: nothing good.)  (2) You show code with two echo statements (not to mention a sed command) with output to the stdout, and yet you show only one line of output.  Please make your question complete. (3) Figure out exactly what your sed command looks like and take a long, hard look at it; that should give you a clue as to what’s going wrong. Commented Apr 25, 2018 at 23:25

2 Answers 2

1

xmllint + sed solution:

xmllint --html --xpath "//p" input.html | sed 's/>[^<>]*</\U&/'

The output:

<p>HELLO</p>
1
  • it is showing unknown option --xpath error Commented Apr 26, 2018 at 13:37
0
$ cat f.html
<h1>head</h1>
<p>hello</p>
<p>world</p>
$ grep -o '<p>.*</p>' f.html | tr '[:lower:]' '[:upper:]' | sed 's/P>/p>/g'  
<p>HELLO</p>
<p>WORLD</p>


# capture other tags: grep multi-pattern e.g 'patt1\|patt2\|pattN'
$ grep -o '<p>.*</p>\|<h1>.*</h1>' f.html | tr '[:lower:]' '[:upper:]' | sed 's/P>/p>/g;s/H1>/h1>/g'
<h1>HEAD</h1>
<p>HELLO</p>
<p>WORLD</p>

# add line after h1 tag : grep+tr+sed
function foo () {
    grep -o '<p>.*</p>\|<h1>.*</h1>' "${1}" | tr '[:lower:]' '[:upper:]' | sed 's/P>/p>/g;s/H1>/h1>/g' | while read line; do 
        case "${line}" in
            "<h1>"*)
                echo "${line}"
                echo "anything that should appear after h1 tags"
            ;;
            "<p>"*)
                echo "${line}"
            ;;
        esac
    done
}

$ foo f.html
<h1>HEAD</h1>
anything that should appear after h1 tags
<p>HELLO</p>
<p>WORLD</p>

# add line after h1 tag : few [shell parameter expansion] tips + while & case statments 
function foo () {
    grep -o '<p>.*</p>\|<h1>.*</h1>' "${1}" | while read line; do 
        case "${line}" in
            "<h1>"*)
                line="${line^^}"; #capitalize (shell parameter expansion)
                echo "${line//H1>/h1>}" # find replace (shell parameter expansion)
                echo "anything that should appear after h1 tags"
            ;;
            "<p>"* | "<P>"*) # if html files contain capitalized P tag and you wanna capture them 
                line="${line^^}"; #capitalize
                echo "${line//P>/p>}" # find replace
            ;;
            "<foo>"*)
                line="${line^^}"; #capitalize 
                linopenintag="${line//<foo>/}"; # <foo>hello world</foo> ==> hello world</foo>
                innerHTML="${linopenintag//<\/foo>/}"; # hello world</foo> ==> hello world
                innerHTMLarr=(${innerHTML});  # in case i want to put each word in a spin or/and style that word differently 

                for eachword in ${innerHTMLarr[@]}; do
                    if [[ "${eachword}" == "something" ]]; then # capture special words ... 
                        echo "<bar style='...'> ${eachword} </bar>"
                    else
                        echo "<bar> ${eachword} </bar>"
                    fi
                done

            ;;
        esac
    done
}
$ foo f.html 
<h1>HEAD</h1>
anything that should appear after h1 tags
<p>HELLO</p>
<p>WORLD</p>
1
  • hi . thank you and it works well. But just to know more, could you please explain the same? Like if I want to perform more than one command for instance capitalize inside p tags and put a line after h1 tage and so. Commented Apr 26, 2018 at 14:44

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.