I have the following problem: Imagine I have a UTF8 file where every Special character is symbolized by the REPLACEMENT_CHARACTER "�". Some part of the file could look like:
Das hier r�ckg�ngig ist das zu machen r�ckg�ngig : ist bereits geamcht Weitere W�rter gibt ers zu korrigieren Hier noch ein bl�des Wort zwei in einer Zeile G�hte und Gr��e
I wrote a PowerShell script which replaces the REPLACEMENT_CHARCTERS by the corresponding Special characters, for example "a", "ü" or "ß". The corrected text, also UTF8, will look like:
Das hier rückgängig ist das zu machen rückgängig : ist bereits geamcht Weitere Wörter gibt ers zu korrigieren Hier noch ein blödes Wort zwei in einer Zeile Göhte und Größe
The Problem is that the program where I want the text to Import to only takes "Wester European DOS (CP850)" encoded files. By the way, that was the original coding which the program has been exported and would have imported without problems if I hadn't opened the file, edited it and saved it in UTF8. So here is what happend:
I exported files from a specific program as "Wester European DOS (CP850)". [Note: Every Special character has its own REPLACEMENT CHARACTER here, so an Import would work easy and restore the Special characters]
I opened the file with an Editor of my choice and the Editor detected "UTF8" on its own which is not correct. I did not recognize, edit the file and saved it as UTF8. [Now every Special character has the same REPLACEMENT CHARACTER, its �]
I have recognized that there is something wrong and wrote a script which replaces every occurrence of � by the right Special Character in UTF8. [I think it doesnt matter how the script does this, but if so, ask]
I have the corrected UTF8 File, but as you remember, I have to Import "Wester European DOS (CP850)" to my program. The same File Encoding as it has exported the file. This Encoding ensures that every Special character has its own unique REPLACEMENT_CHARACTER. So how do i got back to this by PowerShell?
Here are some more Information. The Line in which the script Reads in the file i want to correct is:
$lines = get-content $file -encoding utf8 | select-string $SearchCharacter
The algorithm runs through every line and asks for any wrong word with the character for a correction and skips it if it is found again. After all corrections from all files have been found, it replaces in a loop the occurrences from every "key" (wrong word) to every "value" (corrected word) in each file with this line:
foreach key ...
(Get-Content -encoding utf8 $file) -replace "$key", "$value" | Set-Content -encoding utf8 $file
I already tried to do something like that:
foreach key ...
(Get-Content -encoding utf8 $file) -replace "$key", "$value" | Set-Content -encoding OEM $file
But this results in using "?" instead of the correct character:
Das hier r?ckg?ngig ist das zu machen r?ckg?ngig : ist bereits geamcht Weitere W?rter gibt ers zu korrigieren Hier noch ein bl?des Wort zwei in einer Zeile G?hte und Gr??e
Any suggestions how i can build an "Wester European DOS (CP850)" File from UTF8?
EDIT:
This function, derived from http://www.msdynamics.de/viewtopic.php?f=17&t=25726#p138532, solved my problem:
Function ConvertAndReplace_UTF8_OEM850
{
Param ([String]$path)
$path = resolve-path $path
$sourceEncoding = [System.Text.Encoding]::GetEncoding(65001)
$targetEncoding = [System.Text.Encoding]::GetEncoding(850)
$textfile = [System.IO.File]::ReadAllText($path, $sourceencoding)
[System.IO.File]::WriteAllText($path, $textfile, $targetencoding)
Write-host "Content in $path converted from UTF-8 to OEM850"
}
�(REPLACEMENT CHARACTER,U+FFFD).