I'm trying to search through SharePoint 2010 to find links that point to a broken URL using PowerShell.
Is there an easier way?
I'm trying to search through SharePoint 2010 to find links that point to a broken URL using PowerShell.
Is there an easier way?
This should allow you to search within content pages for a specific string, such as a given URL.
param(
$webUrl = "http://dev:8081",
$urlToFind = "mylink.domain.com/"
)
clear-host
$web = get-spweb $webUrl
$list = $web.Lists["Pages"]
foreach($item in $list.Items)
{
write-host ""
write-host "*** PAGE *** "$item.Url
write-host ""
$file = $item.File
#get binary data, and decode into text
$data = $file.OpenBinary()
$encode = New-Object System.Text.ASCIIEncoding
$text = $encode.GetString($data)
if($text -match $urlToFind)
{
write-warning "FOUND BAD URL IN THIS DOCUMENT!"
}
write-host ""
#comment below to parse all pages
#break
}
If you're looking for something more robust, you can actually identify all the URLs on each page and even make an HTTP request to check if the URL exists.
param(
$url = "http://dev:8081"
)
clear-host
#region URL Tester
function QueryLink($webUrl)
{
#test positive
#$webUrl = "http://www.google.com"
write-host ("Querying domain '" + $webUrl + "'..." + $nl)
$result = MakeHttpRequest $webUrl
if($result -eq $true)
{
write-host ("The url '" + $webUrl + "' exists." + $nl)
}
else
{
write-error ("The url '" + $webUrl + "' does not exist. Status: " + $result + $nl)
}
}
function MakeHttpRequest($url)
{
# First we create the request.
$HTTP_Request = [System.Net.WebRequest]::Create($url)
# We then get a response from the site.
try
{
$HTTP_Response = $HTTP_Request.GetResponse()
}
catch
{
return $false
}
# We then get the HTTP code as an integer.
$HTTP_Status = [int]$HTTP_Response.StatusCode
If ($HTTP_Status -eq 200) {
return $true
}
Else {
$HTTP_Status
}
# Finally, we clean up the http request by closing it.
$HTTP_Response.Close()
}
#endregion
function FindUrlInText($text)
{
#reg credit: http://stackoverflow.com/questions/28259203/regex-to-match-url-in-powershell
$results = [regex]::Matches($text, "(http[s]?|[s]?ftp[s]?)(:\/\/)([^\s,]+)")
foreach($result in $results)
{
write-host ""
write-warning "Found URL:"
write-host $result.Value
write-host ""
write-host "Testing URL..."
QueryLink $result.Value
}
}
#lets work
$web = get-spweb $url
$list = $web.Lists["Pages"]
foreach($item in $list.Items)
{
write-host "*** PAGE *** "$item.Url
$file = $item.File
#get binary data, and decode into text
$data = $file.OpenBinary()
$encode = New-Object System.Text.ASCIIEncoding
$text = $encode.GetString($data)
FindUrlInText $text
#comment below to parse all pages
break
}
Here's something that might (partly) help you on the way. Partly because it only works for pages in the pages libraries across your farm.
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SharePoint.Publishing") | out-null
Add-PSSnapin "Microsoft.SharePoint.PowerShell" -ErrorAction SilentlyContinue
Get-SPSite -Limit ALL | Get-SPWeb -Limit ALL | % {
if ([Microsoft.SharePoint.Publishing.PublishingWeb]::IsPublishingWeb($_)) {
$pWeb = [Microsoft.SharePoint.Publishing.PublishingWeb]::GetPublishingWeb($_)
Write-Host "Processing SPWeb $($pWeb.Url)"
$pWeb.PagesList.Items | % {
Write-Host " Processing Page $($_['FileRef'])"
$_.ForwardLinks | ? { $_.Url.IndexOf("~") -eq -1 } | % {
$url = $_.Url
try {
$HTTP_Request = [System.Net.WebRequest]::Create($url)
$HTTP_Response = $HTTP_Request.GetResponse()
$HTTP_Status = [int]$HTTP_Response.StatusCode
if ($HTTP_Status -ne 200) {
Write-Host "$url appears to be broken ($HTTP_Status)" -ForegroundColor Red
} else {
Write-Host "$url appears to be OK ($HTTP_Status)" -ForegroundColor Green
}
} catch {
Write-Host "$url appears to be broken (Exception)" -ForegroundColor Red
} finally {
if ($HTTP_Response) {
$HTTP_Response.Close()
}
}
}
Write-Host
}
Write-Host "-------------"
}
}