4

I have the following website http://www.shazam.com/charts/top-100/australia which displays songs, I want to capture the songs using RegEx & PowerShell. The PowerShell code below is what I have so far:

    $ie = New-Object -comObject InternetExplorer.Application
    $ie.navigate('http://www.shazam.com/charts/top-100/australia')
    Start-Sleep -Seconds 10
    $null = $ie.Document.body.innerhtml -match 'data-chart-position="1"(.|\n)*data-track-title=.*content="(.*)"><a href(.|\n)*data-track-artist=\W\W>(.|\n)*<meta\scontent="(.*)"\sitemprop';$shazam01artist = $matches[5];$shazam01title = $matches[2]

data-chart-position

data-track-title

data-track-artist

Each of the songs listed have the 3 values (above) associated with each of them, I want to capture the Artist & Title for each song based on the different chart positions (numbers). So a regular expression to find the actual chart position, then the trailing Artist & Title.

If I run the RegEx separately for Artist & Title (code below), it finds them, however it only finds the first Artist & Title. I need to find the Artist & Title for each song based on the different chart position.

$null = $ie.Document.body.innerhtml -match 'data-track-artist=\W\W>(.|\n)*<meta\scontent="(.*)"\sitemprop';$shazam01artist = $matches[2]
$null = $ie.Document.body.innerhtml -match 'data-track-title=.*content="(.*)"><a href';$shazam01title = $matches[1]
$shazam01artist
$shazam01title
0

1 Answer 1

5

Using regex to parse partial HTML is an absolute nightmare, you might want to reconsider that approach.

Invoke-WebRequest returns a property called ParsedHtml, that contains a reference to a pre-parsed HTMLDocument object. Use that instead:

# Fetch the document
$Top100Response = Invoke-WebRequest -Uri 'http://www.shazam.com/charts/top-100/australia'

# Select all the "article" elements that contain charted tracks
$Top100Entries = $Top100Response.ParsedHtml.getElementsByTagName("article") |Where-Object {$_.className -eq 'ti__container'}

# Iterate over each article
$Top100 = foreach($Entry in $Top100Entries){
    $Properties = @{
        # Collect the chart position from the article element
        Position = $Entry.getAttribute('data-chart-position',0)
    }

    # Iterate over the inner paragraphs containing the remaining details
    $Entry.getElementsByTagName('p') |ForEach-Object {
        if($_.className -eq 'ti__artist') {
            # the ti__artist paragraph contains a META element that holds the artist name
            $Properties['Artist'] = $_.getElementsByTagName('META').item(0).getAttribute('content',0)
        } elseif ($_.className -eq 'ti__title') {
            # the ti__title paragraph stores the title name directly in the content attribute
            $Properties['Title']  = $_.getAttribute('content',0) 
        }
    }

    # Create a psobject based on the details we just collected
    New-Object -TypeName psobject -Property $Properties
}

Now, let's see how Tay-Tay's doing down under:

PS C:\> $Top100 |Where-Object { $_.Artist -match "Taylor Swift" }

Position           Title             Artist
--------           -----             ------
42                 Bad Blood         Taylor Swift Feat. Kendrick Lamar

Sweet!

Sign up to request clarification or add additional context in comments.

5 Comments

This is definitely the way to go. I started writing a similar answer about an hour ago, got sidetracked before I could finish writing Invoke-WebRequest and forgot all about it! Great job.
Thanks guys. Regex is magic, but agreed, this is definitely one of the cases where it should be avoided altogether
Parsers typically use regexes behind the scenes (in the lexer), so we may still be using regexes here, just indirectly.
Thanks for this, yes, agreed, a much better way of doing this.
@MarcKean if my answer solved your question, please consider "accepting" it by clicking the checkmark on the left

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.