2

I wrote a script to read XML files and do some editing on some specific nodes then write the file back out.

The issue i am having is that the output file has some extra charachers added to nodes that I didn't edit.

I am assuming this is an encoding issue.

The Relevant code from my script is

function getAssigneeID ($assigneeName) {
    #param($assigneeName)
    #write($assigneeName)
    $assigneeID = $nameIDHash[$assigneeName]
    if ($assigneeID -eq $null -or $a -eq "") {
        return 'Not Found'
    } else {
        return $assigneeID
    }
}
function ValidateAssigneeField ($assignee, $fileContent, $fileURI, $assignees) {
    If (($assignee.InnerText.Length -le 2) -or ($assignee.InnerText.Length -ne 6) -or ($assignee.InnerText[1] -ne 'Z')) {
        write("`tAssignee " + $assignee.InnerText + " is invalid.") >> $Output_Log_File
        #find assignee's ID in nameIDHash 
        $assigneeID = getAssigneeID -assigneeName $assignee.InnerText
        if ($assigneeID -eq 'Not Found' -or $assigneeID -eq $null){
            write("`t`tThe ID for the invalid user " + $assignee.InnerText + " is Not Found.") >> $Output_Log_File
        } else {
            #if the assigneeID is in the list of assignees, remove the name, otherwise replace the name with the ID and save the file.
            write("`t`tThe ID for the invalid user " + $assignee.InnerText + " is " + $assigneeID) >> $Output_Log_File
            $assigneeIdAlreadyInList = $false
            foreach ($user in $assignees){
                #write("user = " + $user.InnerText + ", ID = " + $assigneeID)
                If ($user.InnerText -eq $assigneeID){
                    $assigneeIdAlreadyInList = $true
                } else {
                }
            }
            #write ($assigneeIdAlreadyInList)
            if ($assigneeIdAlreadyInList){
                write("`t`t" + $assigneeID + " already exists in the assignee list, removing " + $assignee.InnerText) >> $Output_Log_File
                [void]$assignee.ParentNode.RemoveChild($assignee)
            } else {
                write("`t`tReplacing " + $assignee.InnerText + " with " + $assigneeID + ".") >> $Output_Log_File
                $assignee.InnerText = $assigneeID
            }
            write("`t`tSaving the file " + $fileURI + ".") >> $Output_Log_File
            #$fileContent.save($fileURI)
            #$file | out-file -Encoding "UTF8" -FilePath $fileURI
            #$MyXML | out-file -Encoding "UTF8" -FilePath $fileURI
            #$fileContent | out-file -Encoding "UTF8" -FilePath $fileURI
        }
    } else {
        write("`tAssignee " + $assignee.InnerText + " is OK.") >> $Output_Log_File
    }
}
$workitemBasePath = "C:\temp\dev\workitems\Dev_ECH\"
$Output_Log_File = "C:\ALM\Reports\Dev_ECH - Correct All Assignees.txt"
$NameIDHash = @{
"Jade West" = "zzzzzz"
"Tonya Killebrew" = "AZCJNZ"}
$today = Get-Date -format s
write($today + " - Running Correct invalid Assignees.ps1.") > $Output_Log_File
$files = Get-ChildItem -Path $workitemBasePath -include workitem.xml -Recurse | % { $_.FullName }
foreach ($file in $files){
    write("Evaluating " + $file) >> $Output_Log_File
    [xml]$MyXML = Get-Content $file
    $assigneeList = $MyXML.SelectNodes('//work-item/field[@id="assignee"]/list/item')
    if ($assigneeList.count -eq 0) {
        $assigneeList  = $MyXML.SelectNodes('//work-item/field[@id="assignee"]')
    }
    foreach ($assignee in $assigneeList) {
        ValidateAssigneeField -assignee $assignee -fileContent $MyXML -fileURI $file -assignees $assigneeList
    }

}

And then in ValidateAssigneeField I do some editing of the assignee node and save the file with

    $fileContent.save($fileURI)

In the output XML file i see the following extra characters added to some of the text fields.

  <field id="description" text-type="text/plain">​Navistar has reported that the transmission remains in Drive when the operator selects a fast sequence from Drive to Reverse to Manual mode. When selecting a similar sequence from Reverse to Drive to Manual mode, the transmission drive as expected.</field>

​ and  are added in seemingly random places.

I am assuming i need to find out what encoding the original XML is in and then output my edited XML in the same format.

How do i change the output format of the $fileContent.save($fileURI) command?

<?xml version="1.0" encoding="UTF-8"?>
<work-item>
    <field id="assignee">Jade West</field>
    <field id="author">RZPRRK</field>
    <field id="created">2019-08-08 10:41:39.163 -0400</field>
    <field id="description" text-type="text/html">Tst</field>
    <field id="dueDate">2019-08-05</field>
    <field id="nextReviewDate" type="date">2019-08-15</field>
    <field id="osNumber" type="string">23457</field>
    <field id="osOpenDate" type="date">2019-07-30</field>
    <field id="previousStatus">toBeScreened</field>
    <field id="priority">2.0</field>
    <field id="rational" text-type="text/html" type="text/html">Test</field>
    <field id="release" type="enum:release">na</field>
    <field id="resolution">duplicate</field>
    <field id="resolvedOn">2019-08-08 10:42:22.987 -0400</field>
    <field id="severity">normal</field>
    <field id="status">inProcess</field>
    <field id="title">HWCR - Reject</field>
    <field id="type">hardwareChangeRequest</field>
</work-item>

<?xml version="1.0" encoding="UTF-8"?>
<work-item>
  <field id="assignee">
    <list>
      <item>XZM030</item>
    </list>
  </field>
  <field id="author">XZM030</field>
  <field id="automatedTestAffected" type="enum:productDocumentAffected">notRequired</field>
  <field id="created">2019-06-06 13:59:27.726 -0400</field>
  <field id="customerImpact" type="enum:productGenricYesNo">yes</field>
  <field id="customerImpactNotes" text-type="text/plain" type="text/html">See description</field>
  <field id="cyberSecurityAffected" type="enum:productGenricYesNo">no</field>
  <field id="datalinkTechData" type="enum:productDocumentAffected">notRequired</field>
  <field id="description" text-type="text/plain">​Navistar has reported that the transmission remains in Drive when the operator selects a fast sequence from Drive to Reverse to Manual mode. When selecting a similar sequence from Reverse to Drive to Manual mode, the transmission drive as expected.</field>
  <field id="designReviewComments" text-type="text/plain" type="text/html">​3/4/19-accepted with addtions to test plan</field>
  <field id="designReviewRequired" type="enum:productDocumentCompleted">completed</field>
  <field id="designedDate" type="date">2019-03-26</field>
  <field id="diagAffected" type="enum:productDiagAffected">no</field>
  <field id="fmeaRequired" type="enum:productDocumentAffected">notRequired</field>
  <field id="functionalSafetyAffected" type="enum:productGenricYesNo">no</field>
  <field id="linkedWorkItems">
    <list>
      <struct>
        <item id="role">affected_by</item>
        <item id="workItem">COMM-47223</item>
      </struct>
    </list>
  </field>
  <field id="priority">4.0</field>
  <field id="release" type="enum:release">na</field>
  <field id="requirementsAffected" type="enum:productDocumentAffected">notRequired</field>
  <field id="rootCauseDescription" text-type="text/plain" type="text/html">​The TCM logic that controls express preselect for the hold postion looks at if forward is attained but not the currently selected postion.   Therefore with a quick transistion from D-R-H the transmission does not have time to actually make a shift to Reverse and there for the forward attined is still true when hold is recieved. </field>
  <field id="screenedDate" type="date">2019-03-26</field>
  <field id="serviceImpact" type="enum:productGenricYesNo">yes</field>
  <field id="serviceImpactNotes" text-type="text/plain" type="text/html">affects OEMs using the non-ATI standard selector interface only.  OEMs using the non- ATI basic selector interface are not effected.</field>
  <field id="severity">normal</field>
  <field id="sharePointID" type="string">2884</field>
  <field id="simToolAffected" type="enum:productDocumentAffected">notRequired</field>
  <field id="softwareCRIsRequired" type="boolean">true</field>
  <field id="solutionDescription" text-type="text/plain" type="text/html">​The TCM logic that controls express preselect for the hold postion needs to look at the selected position and if forward is attined.</field>
  <field id="status">na</field>
  <field id="synergyCRNumber" type="string">,10516,</field>
  <field id="syscrType" type="string">Incident</field>
  <field id="techData(Regular)" type="enum:productDocumentAffected">notRequired</field>
  <field id="tempStatus" type="string">n/a</field>
  <field id="testPlanAffected" type="enum:productDocumentAffected">notRequired</field>
  <field id="testRunWhereValidated" type="string">BCD 191 PC</field>
  <field id="title">Other: OEM Standard Shift Selector D-to-R-to-Manual Transition Complaint</field>
  <field id="type">other</field>
  <field id="typeForDependencyOnly" type="enum:otherProductDependencyType">other</field>
  <field id="vepsqaAffected" type="enum:vepsAffected">no</field>
</work-item>
2
  • 1
    I would consider using the -Encoding parameter on Get-Content. Beware that using -Encoding UTF8 with *-Content in PowerShell native will result in a BOM. Commented Dec 19, 2019 at 16:14
  • 1
    "​" and " " (without quotes) are UTF-8 encoded characters (respectively) Zero Width Space (U+200B) and No-Break Space (U+00A0)… Commented Sep 25, 2020 at 19:16

2 Answers 2

1

Without having an input file try the below change:

[xml]$MyXML = Get-Content $file -Raw

EDIT: You can also output

$file | Out-File -Encoding "UTF8"

EDIT EDIT:

What if you do

$newfile = ValidateAssigneeField -assignee $assignee -fileContent $MyXML -fileURI $file -assignees $assigneeList 

$newfile | Out-File -Encoding "UTF8" -FilePath "DESTINATION"

Sign up to request clarification or add additional context in comments.

1 Comment

Comments are not for extended discussion; this conversation has been moved to chat.
0

I would advise against using ">>" or "out-file -append". It can mix different encodings in the same file, especially since out-file defaults to unicode (utf16). "add-content" works better. Bug report: https://github.com/PowerShell/PowerShell/issues/9423

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.