3

I have strings containing characters which are not found in ASCII; such as á, é, í, ó, ú; and I need a function to convert them into something acceptable such as a, e, i, o, u. This is because I will be creating IIS web sites from those strings (i.e. I will be using them as domain names).

1
  • 2
    In general, it's called transliteration. Normalizing to FormD and filtering will work to convert composed Latin letters to Basic Latin letters but not ligatures (dž, ǣ, ij, … ) and such. See this question. Commented Oct 10, 2017 at 16:26

2 Answers 2

3
function Convert-DiacriticCharacters {
    param(
        [string]$inputString
    )
    [string]$formD = $inputString.Normalize(
            [System.text.NormalizationForm]::FormD
    )
    $stringBuilder = new-object System.Text.StringBuilder
    for ($i = 0; $i -lt $formD.Length; $i++){
        $unicodeCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($formD[$i])
        $nonSPacingMark = [System.Globalization.UnicodeCategory]::NonSpacingMark
        if($unicodeCategory -ne $nonSPacingMark){
            $stringBuilder.Append($formD[$i]) | out-null
        }
    }
    $stringBuilder.ToString().Normalize([System.text.NormalizationForm]::FormC)
}

The resulting function will convert diacritics in the follwoing way:

PS C:\> Convert-DiacriticCharacters "Ångström"
Angstrom
PS C:\> Convert-DiacriticCharacters "Ó señor"
O senor

Copied from: http://cosmoskey.blogspot.nl/2009/09/powershell-function-convert.html

Sign up to request clarification or add additional context in comments.

1 Comment

It does not replace polish ł letter although
2

Taking this answer from a C#/.Net question it seems to work in PowerShell ported roughly like this:

function Remove-Diacritics
{
    Param([string]$Text)


    $chars = $Text.Normalize([System.Text.NormalizationForm]::FormD).GetEnumerator().Where{ 

        [System.Char]::GetUnicodeCategory($_) -ne [System.Globalization.UnicodeCategory]::NonSpacingMark

    }


    (-join $chars).Normalize([System.Text.NormalizationForm]::FormC)

}

e.g.

PS C:\> Remove-Diacritics 'abcdeéfg'
abcdeefg

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.