1

I'm executing a PS script to read the contents of an xml, update few tag values and store contents into multiple xml files. I'm able to achieve all this but the xml files created are not getting read properly by the messaging queue to which it is passed. BUT the same xml file works in the queue when I open it and click save without making any changes to the data. I compared the 2 files 1 - after it is created and 2 - after I open the same and click save and they are identical! I cannot for the life of me figure out what is going wrong and how to fix it.

How to create an output xml file in a readable format? Not sure what changes when I click 'Save' on the xml files. Please help.

input CASH.XML:

<?xml version="1.0" encoding="UTF-8"?>
<ns:POSTransaction xmlns:ns="http://schema.xyz.com/Commerce/Customer/Transaction/v1">
<ns:tranHeader>
<ns:transactionId>96846836238236142669</ns:transactionId>
<ns:businessDateTime>2021-12-25T01:10:00</ns:businessDateTime>
<ns:emailId>[email protected]</ns:emailId>
</ns:tranHeader>
</ns:POSTransaction>

PS:

$log="H:\logs.txt"
[xml]$loadXML = Get-Content "H:\Q_This\CASH.XML"

try
{
   $tranID = $loadXML.POSTransaction.tranHeader.transactionId.substring(17,3)
   $tranIntID = [int]$tranID   
   $tranc = $loadXML.POSTransaction.tranHeader.transactionId.substring(0,17)    
   $uname = $loadXML.POSTransaction.tranHeader.emailId.substring(0,11)
   $mailcnt = [int]$loadXML.POSTransaction.tranHeader.emailId.substring(11,3)
   $mailend = $loadXML.POSTransaction.tranHeader.emailId.Split("@")[1]

   for ($mailcnt; $mailcnt -lt 10; $mailcnt++)
   {    
        for ([int]$i =1; $i -le 5; $i++)
        {
        $mailupd = ([string]($mailcnt+1)).PadLeft(3,'0')
        $tranIntID = $tranIntID+1
        $loadXML.POSTransaction.tranHeader.transactionId = $tranc+[string]$tranIntID
        $loadXML.POSTransaction.tranHeader.emailId = $uname+$mailupd+'@'+$mailend
        $fileName = "CASH_"+$tranIntID+"_"+$mailupd+".XML"
        $loadXML.Save("H:\Q_This\"+$fileName)
        }
   }
}
catch
{
    Write-Host $_.Exception.Message
    Add-content $log -value ([string](Get-Date) + ' ' +$_.Exception.Message)    
}

The above code created 40 output xml files: 5 transaction files for each emailID from Performancetest003-010@ymail.com. However none of it was recognised by the messaging queue until I opened and clicked save (with no data change).

5
  • What did you open and save the XML file with? Could this be an issue with text encoding? (UTF-8 vs ASCII, etc.) When you say "none of it was recognised by the messaging queue", what was the error message and what was the queue technology? Commented Dec 24, 2021 at 22:16
  • I'm assuming that the linked duplicate answers your question; let us know if it doesn't. Commented Dec 25, 2021 at 3:31
  • 1
    That the problem goes away when you re-save the file in an editor may be due to the editor saving the file without a BOM. As an aside: If your UTF-8-encode XML file does not have a BOM and you're using Get-Content to read it in Windows PowerShell, it may be misinterpreted (PowerShell (Core) 7+ now consistently defaults to UTF-8). Either use -Encoding utf8, or, preferably, use [xml] type's .Load() method to load the file - see the bottom section of this answer. Commented Dec 25, 2021 at 3:38
  • @mklement0 You're right. Encoding is the issue here. All the files that I save externally are in ANSI format that's getting read successfully by the downstream queue. I tried a couple of different solutions from the thread you'd redirected me to but the output file still saves as UTF-8. Not sure how to convert to XML ANSI format. Here are the things I'd tried (unsuccessfully) 1. ($loadXML = [xml]::new()).Load((Convert-Path "H:\CASH.XML")) instead of Get-Content 2. $loadXML.Save("H:\"+$fileName) | Set-Content -LiteralPath "H:\$fileName" -Encoding Ascii #to change output file format to ANSI Commented Dec 27, 2021 at 8:54
  • Spoke too soon! Used the below line after the output XML is created and I'm able to generate an ANSI xml file. I'm sure there are more elegant ways to go about this but this one works :') Get-Content H:\$fileName | out-file -encoding ASCII H:\new_$fileName Commented Dec 27, 2021 at 9:12

1 Answer 1

2

XML APIs have support for character encoding bult in, and if a given XML document's declaration specifies an encoding explicitly in its XML declaration (e.g. <?xml version="1.0" encoding="utf-8"?> ), that encoding is respected both on reading from and writing to files.

Therefore, the robust way to read and write XML files is to use a dedicated XML API - the [xml] (System.Xml.XmlDocument) type's .Load() and .Save() methods in this case - rather than plain-text processing cmdlets such as Get-Content and Set-Content / Out-File.

Caveat:

  • As of .NET 6.0 / PowerShell 7.2, the .Save() method unexpectedly saves an XML document with an explicit encoding attribute of "utf-8" to a UTF-8 file with a BOM (byte-order mark), which causes problems for some XML consumers (even though it shouldn't). The workaround is to remove the expiicit encoding attribute (set it to $null); see this answer for details.

Your later feedback indicates that you're looking for ANSI-encoded output XML files, i.e. that your goal is to transcode the input XML from UTF-8 to ANSI.

The following is a simplified, self-contained example of such transcoding. It assumes that your system's active ANSI code page is Windows-1252.

# In- and output files.
# IMPORTANT:
#   Always use *full, file-system-native paths* when calling .NET methods.
$inFile =   Join-Path $PWD.ProviderPath in.xml
$outFile =  Join-Path $PWD.ProviderPath out.xml

# Create a UTF-8-encoded sample input file,
# for simplicity with plain-text processing.
# Note the non-ASCII character in the element text ('ä')
'<?xml version="1.0" encoding="utf-8"?><foo>bär</foo>' | Set-Content -Encoding utf8 $inFile

# Read the file using the XML-processing API provided via the [xml] type.
$xml = [xml]::new()
$xml.Load($inFile)

# Now change the character-encoding attribute to the desired new encoding.
# An XML declaration - if present - is always the *first child node* 
# of the [xml] instance.
$xml.ChildNodes[0].encoding = 'windows-1252'

# Save the document.
# The .Save() method will automatically respect the specified encoding.
$xml.Save($outFile)

To verify that the output file was correctly Windows-1252-encoded, use the following command:

  • PowerShell (Core) 7+
# PowerShell (Core) defaults to UTF-8 in the absence of a BOM.
Get-Content -Encoding 1252 $outFile
  • Windows PowerShell
# Windows PowerShell *defaults* to the 
# system's active ANSI code page in the absence of a BOM.
Get-Content $outFile

You should see the following output - note the correct rendering of the non-ASCII character, ä:

<?xml version="1.0" encoding="windows-1252"?>
<foo>bär</foo>

Note:

  • Do not try to perform transcoding via plain-text processing, such as using a combination of Get-Content and Set-Content, because, with an explicit encoding attibute in the input XML you'll create self-contradictory XML files; that is, the encoding that the document claims to have in its XML declaration then won't match the actual encoding. This may not always matter (if the consumer too performs plain-text processing instead of proper XML parsing), but should be avoided for conceptual clarity alone.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.