1

Background: I changed filenames of .mp4 videos to lowercase and replaced the special characters as well as spaces. Now I have to change the associated URLs inside of .txt files in a similar manner. There are many text files which contains plenty of these URLs referring to the videos.

Issue: I should replace the special characters in every string between "flashplayer" and "/flashplayer" at any textfile, but must not change anything outside the flashplayer tags.

I don't know how to select the strings between "flashplayer" and "/flashplayer" for the replacement.

Sample string:

(flashplayer width="640" height="480" position="1")file=/wiki/data/media/sales/a/ö 2.mp4&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0(/flashplayer)

This sample is included in a textfile (DokuWiki page). The () imply tag characters.

Sample output string:

(flashplayer width="640" height="480" position="1")file=/wiki/data/media/sales/a/oe_2.mp4&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0(/flashplayer)

The replacement with rename-item should be:

  • ä = ae
  • ö = oe
  • ü = ue
  • ' ' = '_'

Update: the script looks like:

# vars (User-Eingabe)
$source = "d:\here\name\test\pages"
$search = '(\<flashplayer.*?\>file\=/wiki/87sj38d/media)(.*?)(\<\/flashplayer\>)'
$a = 1
Write-Host "`nSource:`t $source`n"
# replace special characters
gci $source -r -Filter *.txt | ForEach-Object {
    $text = Get-Content $_.FullName | ForEach-Object {
        if($_ -match $search) {
            $_ -replace [Regex]::Escape($Matches[2]), ($Matches[2] -replace'ö', 'oe' -replace'ä', 'ae' -replace'ü', 'ue' -replace'\s', '_' )
            $output = $Matches[2]
            $tags = $a++         
            Write-Host "`nTag $tags : $output"
        } else {
            $_
        }
    }
    $text | Set-Content $_.FullName
}

The textfiles contain a line of code like this:

{{backlinks>path:product:description:kennwort_aendern}}

The script works only if I delete this line of code. Otherwise the string between the flashplayertags stay the same. Confusingly enough, the replacement operates sometimes and sometimes not. The string between the flashplayertags can contain many special characters. See sample string:

<flashplayer_width="640"_height="480"_position="1">file=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.mp4&image=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.jpg&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0</flashplayer>

The Write-Host $output shows all strings correctly but the replacement doesn't function properly.

3
  • Can you post any sample string as well? Commented Jul 23, 2014 at 11:41
  • Could you also add a required output string? Commented Jul 23, 2014 at 11:54
  • Now the question contains a sample. thank you Commented Jul 23, 2014 at 11:55

2 Answers 2

2

You can try something like this. For each textfile, it will replace the special characters on every flashplayer line.

Get-ChildItem -Path "c:\FolderOfTextfiles" -Filter *.txt | ForEach-Object {

    $text = Get-Content $_.FullName | ForEach-Object {
        if($_ -match '(?<=\(flashplayer.*?\))(.*?)(?=\(/flashplayer\))') {
            $_ -replace [Regex]::Escape($Matches[1]), ($Matches[1] -replace'ö', 'oe' -replace 'ä', 'ae' -replace 'ü', 'ue' -replace '\s', '_' )
        } else {
            $_
        }
    }

    $text | Set-Content $_.FullName

}

UPDATE: If the text contains linebreaks, then you could try this global multiline regex matching apporach:

$s = @'
<flashplayer_width="640"_height="480"_position="1">file=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.mp4&image=/wiki/87sj38d/media/ab/
any/test/1001_Grundlagen Kennwort ändern.jpg&config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0</flashplayer>
<flashplayer_width="640"_height="480"_position="1">file=/wiki/87sj38f/media/ab/any/test/1001_Grundlagen Kennwort ändern.mp4&image=/wiki/87sj38d/media/ab/any/test/1001_Grundlagen Kennwort ändern.jpg&
config=/wiki/lib/plugins/flashplayer/config_video.xml&start=0</flashplayer>
'@

#Read text as single string
#PS 3.0+
#$s = Get-Content .\test.txt -Raw

#PS 2.0
#$s = Get-Content .\test.txt | Out-String

$s = [regex]::Replace($s, '(?s)(?<=<flashplayer.*?>file=/wiki/87sj38d/media).*?(?=</flashplayer>)', { 
    param([System.Text.RegularExpressions.Match]$m)
    $m.Value -replace 'ö', 'oe' -replace 'ä', 'ae' -replace 'ü', 'ue' -replace ' ', '_'
})

$s    

#Save
#$s | Set-Content .\test.txt

This is a bit more complicated solution, because AFAIK you can't modify $1 (captured group) when using -replace 'pattern', '$1' in the current PowerShell version. If someone has a better solution, please share :)

Sign up to request clarification or add additional context in comments.

8 Comments

The input string (.*?) can contain a | character. In case of an existing | between the flashplayer tags, the script doesn't work and put the captured string additionally to the rewritten string. Special characters can be interpreted by powershell as string in combination with a precending '\'. Is there a way to interpret the input string only as string and ignore the optional characters? Subsqeuently the input und false output string: [[http://a/b/docs/d/e | description ]] [[http://a/b/docs/d/e | description|a/b/docs/d/e | description ]] Existing special characters are processed well
Try updated answer. I've added an escape-method in the -replace command to make sure it ignores the special characters. If it doesn't work, could your provide a string that doesn't work that follows the pattern? (flashplayer .....) sajdkaljdlsadkasd (/flashplayer). And please update your question when providing code. It's hard to understand them in comments. :)
So my update didn't work? Is it a typo in the new sample? You've replaced (flashplay...) with <flashplayer ...>.
firstly my "grather than" and "smaller than" signs wouldn't be accepted by this post. So i changed it to brackets. The updated sample at least is a correct view of the string. The search term in $search does the replacement most of the time, but not when the backlinks code line (see update) is in a text file. Is my adjusted search term correct?
< and > are accepted when you put it inside code blocks, which you should always do(someone fixed it for you this time). In your $search-pattern, you have removed my lookaheads/lookbehinds. This breaks the text-replacement. Try $search = '(?<=<flashplayer.*?>file=/wiki/87sj38d/media)(.*?)(?=</flashplayer>)'
|
0

Here you have the commands you could use to replace the mentioned characters. You will need to change the filepath according to the location of the textfiles. Replace-FileString.ps1 is used; http://windowsitpro.com/scripting/replacing-strings-files-using-powershell

./Replace-FileString  -Pattern '(flashplayer)(.*)ä(.*)(\/flashplayer)'  -Replacement '$1$2ae$3$4'  -Path C:\test\*.txt  -Overwrite
./Replace-FileString  -Pattern '(flashplayer)(.*)ö(.*)(\/flashplayer)'  -Replacement '$1$2oe$3$4'  -Path C:\test\*.txt  -Overwrite
./Replace-FileString  -Pattern '(flashplayer)(.*)ü(.*)(\/flashplayer)'  -Replacement '$1$2ue$3$4'  -Path C:\test\*.txt  -Overwrite
./Replace-FileString  -Pattern '(flashplayer)(.*) (.*)(\/flashplayer)'  -Replacement '$1$2_$3$4'  -Path C:\test\*.txt  -Overwrite

It opens and writes all textfiles (even if it doesn't change anything). It only changes the lines where "ä", "ö", "ü" or " " is found between the strings "flashplayer" and "/flashplayer".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.