Parsing output from wget and grep using bash

Question

I can get a url link and search for text that starts with file: but I'm having issues parsing it from there.

Example:

wget -qO- http://website.com/site/ | tr \" \\n | grep -w file:\* > output.txt

The wget command Gives me the output:

file: 'http://website.com/site/myStream/playlist.m3u8?wmsAuthSign=c2VydmVyXs',

I'm trying to get the output to look like.

http://website.com/site/myStream/playlist.m3u8?wmsAuthSign=c2VydmVyXs

My goal is to have a bash script that includes several source / list of url's that will be looped through and each processed / grep'd output url will be on it's own separate line.

http://website.com/site/myStream/playlist.m3u8?wmsAuthSign=c2VydmVyXs

As requested:

Here's an example of the output of what wget -qO- http://website.com/site/ sends back.

player.setup({
  file: 'http://website.com/site/myStream/playlist.m3u8?wmsAuthSign=c2VydmVyXs',
  width: "100%",
  aspectratio: "16:9",

});

If you want to parse html, I recommend to use something made for parsing html instead of grep. Depending on the actual HTML files you want to parse, you may get away with grep, but there'll be plenty of variants your regular expression won't catch. — dirkt
– dirkt, Commented Aug 25, 2019 at 7:20
can you show an example of the ACTUAL output of the wget command before any processing with tr or grep? — cas
– cas, Commented Aug 25, 2019 at 7:22
@cas ok I updated the question to include just what the wget -qO- http://website.com/site/ with no processing outputs. — Rick T
– Rick T, Commented Aug 25, 2019 at 7:58
ok, so it's not returning HTML. looks like it's returning a function call with embedded json. lynx -dump won't work for that at all. — cas
– cas, Commented Aug 25, 2019 at 8:01

cas · Accepted Answer · 2019-08-25 08:04:28Z

2

This will do what you want:

wget -qO- http://website.com/site/ | \
  sed -n -e "/^ *file: */ { s/^ *file: *'//; s/', *$//p}" > output.txt

edited Aug 25, 2019 at 8:04

answered Aug 25, 2019 at 7:27

cas

84.6k9 gold badges137 silver badges205 bronze badges

I installed it unfortunately it creates a blank file... the line wget -qO- http://website.com/site/ | tr \" \\n | grep -w file:\* > output.txt works it just doesn't fully parse it the way I need it.

Rick T
– Rick T

2019-08-25 07:33:20 +00:00
Commented Aug 25, 2019 at 7:33
then please add a sample of the raw output of wget to your question. Without that, only rough guesses are possible.

cas
– cas

2019-08-25 07:35:12 +00:00
Commented Aug 25, 2019 at 7:35
It's highlighted in the question under Example? is it not showing up in the question?

Rick T
– Rick T

2019-08-25 07:36:09 +00:00
Commented Aug 25, 2019 at 7:36
also, does the lynx command about without the grep show a list of links? does your http://website.com/site/ require authentication? if so, you might need to use the -auth=ID:PASSWD option. or visit the site manually with lynx using the --cookie_file option and then use the same cookie file with the lynx -dump ... later.

cas
– cas

2019-08-25 07:38:42 +00:00
Commented Aug 25, 2019 at 7:38
1

yeah, it works at the moment. It's fragile, though - any change in the data returned by the web site could and probably will break it. e.g. just removing all linefeeds from the data will a) probably be valid for the official/expected client software, but b) break the above until you edit the sed script to cope with the new situation.

cas
– cas

2019-08-25 08:11:43 +00:00
Commented Aug 25, 2019 at 8:11

| Show 5 more comments

Stack Exchange Network

Parsing output from wget and grep using bash

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Parsing output from wget and grep using bash

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions