1

I have a CSV StackOverflow-style coded URLs, and need to convert them to HTML with Powershell and save them back in the third column of the CSV.

CSV Sample:

ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),

What I need

cID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),Check <a href="https://stackoverflow.com">this out</a>    
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),To buy coffee <a href="https://google.com">Click Here</a> or <a href="https://bing.com">here</a>

I'm having problems parsing this where there are multiple URLs in the Comment field.

My (ugly, highly-verbose) Powershell code:

$comment_list = Import-Csv "c:\temp\Comments.csv"
$comment_list.foreach(
{
  $x = $_.Comment
  $linktextOUT = $x.Substring($x.IndexOf('[')+1,$x.IndexOf(']')-$x.IndexOf('[')-1)
  $linktextREPLACE = "[" + $linktextOUT + "]" 
  $URLOUT = $x.Substring($x.IndexOf('(')+1,$x.IndexOf(')')-$x.IndexOf('(')-1)
  $URLREPLACE = "(" + $URLOUT + ")" 
  $output = $x.Replace($URLREPLACE, "")
  $output = $output.Replace($linktextREPLACE, "")
  $output = $output + "<a href=""" + $URL + """>" + $linktext + "</a>"
  Write-Host $_.cID","$_.Comment","$output
})

What it outputs:

1 , Check [this out](https://stackoverflow.com) , Check <a href="https://stackoverflow.com">this out</a>    
2 , To buy coffee [click here](https://google.com) or [here](https://bing.com) , To buy coffee  or [here](https://bing.com)<a href="https://google.com">click here</a>

You can see that the first line is just fine - the Powershell outputs the correct third column. However, it can't see the second URL in line 2, so it completely skips the "bing.com" URL. Note that some lines may have up to 5 URLs in the comments.

Any help is appreciated in advance.

4
  • 1
    Switch to PowerShell 6+ and apply ConvertFrom-Markdown cmdlet… Commented Aug 22, 2021 at 20:05
  • 1
    @JosefZ, agreed, that is the path of least resistance; but what if the OP is in a position where that is not an option. Many organizations/enterprises are not even on PSv4 or 5 to date. So, other options for legacy versions are prudent. Especially since it appears that is what the is on at this time, and we may assume they cannot move to PSCore at all. Commented Aug 22, 2021 at 20:37
  • @JosefZ Put it as an answer and I'll test and accept - that seems like the best solution. I tried to just run ConvertFrom-Markdown piped via out-file, but that didn't work. Commented Aug 26, 2021 at 2:12
  • 1
    @Beems, please see my updated answer, which shows a ConvertFrom-Markdown solution, along with an explanation of why its use is probably overkill for your use case. Commented Aug 29, 2021 at 21:49

3 Answers 3

4

Use of the -replace operator enables a concise solution:

@'
ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),
'@ > Comments.csv

Import-Csv Comments.csv | ForEach-Object {
  $_.CommentConverted =
    $_.Comment -replace '\[(.+?)\]\((.+?)\)', '<a href="$2">$1</a>'
  $_ # output the modified object
}

As JosefZ points out, PowerShell (Core) 7+ has a ConvertFrom-Markdown utility that converts Markdown input to both HTML text (property .Html on the output objects) and text with embedded VT (Virtual Terminal) escape sequences (property .VT100EncodedString), as well as an AST (abstract syntax tree) representation of the Markdown input text.

In the case at hand, however, its benefits are limited, because the input string is invariably wrapped in a <p> HTML element, which necessitates removing that wrapper to get the desired result; thus, in this particular instance, the above, regex-based solution is not only more concise, but performs better.

For the sake of completeness, here's a solution based on ConvertFrom-Markdown:

@'
ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),
'@ > Comments.csv

Import-Csv Comments.csv | ForEach-Object {
  $_.CommentConverted = ([xml] ($_.Comment | ConvertFrom-Markdown).Html).p.InnerXml
  $_
}
Sign up to request clarification or add additional context in comments.

Comments

2

How about this?

$csv = @'
ID,Comment,CommentConverted
1,Check [this out](https://stackoverflow.com),
2,To buy coffee [click here](https://google.com) or [here](https://bing.com),
'@

$data = $csv | ConvertFrom-Csv

$pattern = '\[(.+?)\]\s?\((.+?)\)'
$data | ForEach-Object {
    $object = $_
    $object.Comment | Select-String -Pattern $pattern -AllMatches | ForEach-Object {
        $line = $_.Line
        $_.Matches | ForEach-Object {

            $new = "<a href=""$($_.Groups[2].Value)"">$($_.Groups[1].Value)</a>"
            $line = $line -replace [regex]::Escape($_.Groups[0].Value), $new
        }
        $object.CommentConverted = $line
        $object
    }
}

Output

ID               : 1
Comment          : Check [this out](https://stackoverflow.com)
CommentConverted : Check <a href="https://stackoverflow.com">this out</a>

ID               : 2
Comment          : To buy coffee [click here](https://google.com) or [here](https://bing.com)
CommentConverted : To buy coffee <a href="https://google.com">click here</a> or <a href="https://bing.com">here</a>

Comments

0

Is this what you are after?

Import-Csv -Path '.\comments.csv'
# Results
<#
ID Comment                                                                    CommentConverted
-- -------                                                                    ----------------
1  Check [this out](https://stackoverflow.com)                                                
2  To buy coffee [click here](https://google.com) or [here](https://bing.com)  
#>


Import-Csv -Path '.\comments.csv' | 
ForEach-Object {
    [PSCustomObject]@{
        ID               = $PSItem.ID
        Comment          = $PSItem.Comment
        CommentConverted = ((($PSItem.Comment -replace '\(', '<a href=') -replace '\)', '>Click Here</a>') -replace '\[here\]') -replace '\[click here\]|\[|\]'
    }
}
# Results
<#
ID Comment                                                                    CommentConverted                                                                                  
-- -------                                                                    ----------------                                                                                  
1  Check [this out](https://stackoverflow.com)                                Check this out<a href=https://stackoverflow.com>Click Here</a>                                    
2  To buy coffee [click here](https://google.com) or [here](https://bing.com) To buy coffee <a href=https://google.com>Click Here</a> or <a href=https://bing.com>Click Here</a>
#>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.