Why is there such a drastic difference?
Because you're doing something wildly different in PowerShell
How can I get Powershell to give the smaller output that the GNU tool gives?
By doing what base64 does :)
Let's have a look at what base64 ... > ... actually does:
base64:
- Opens file handle to input file
- Reads raw byte stream from disk
- Converts every 3-byte pair to a 4-byte base64-encoded output string-fragment
>:
- Writes raw byte stream to disk
Since the 4-byte output fragments only contain byte values that correspond to 64 printable ASCII characters, the command never actually does any "string manipulation" - the values on which it operates just happen to also be printable as ASCII strings and the resulting file is therefor indistinguishable from a "text file".
Your PowerShell script on the other hand does lots of string manipulation:
Get-Content $input:
- Opens file handle to input file
- Reads raw byte stream from disk
- Decodes the byte stream according to some chosen encoding scheme (likely your OEM codepage)
[Encoding]::UTF8.GetBytes():
- Re-encodes the resulting string using UTF8
[Convert]::ToBase64String()
- Converts every 3-byte pair to a 4-byte base64-encoded output string-fragment
Out-File:
- Encodes input string as little-endian UTF16
- Writes to disk
The three additional string encoding steps highlighted above will result in a much-inflated byte stream, which is why you're seeing the output size double or triple.
How to base64-encode files then?
The trick here is to read the raw bytes from disk and pass those directly to [convert]::ToBase64String()
It is technically possibly to just read the entire file into an array at once:
$bytes = Get-Content path\to\file.ext -Encoding Byte # Windows PowerShell only
# or
$bytes = [System.IO.File]::ReadAllBytes($(Convert-Path path\to\file.ext))
$b64String = [convert]::ToBase64String($bytes)
Set-Content path\to\output.base64 -Value $b64String -Encoding Ascii
... I'd strongly recommend against doing so for files larger than a few kilobytes.
Instead, for file transformation in general you'll want to use streams. In this particular case, you'll want want to use a CryptoStream with a ToBase64Transform to re-encode a file stream as base64:
function New-Base64File {
[CmdletBinding(DefaultParameterSetName = 'ByPath')]
param(
[Parameter(Mandatory = $true, ParameterSetName = 'ByPath', Position = 0)]
[string]$Path,
[Parameter(Mandatory = $true, ParameterSetName = 'ByPSPath')]
[Alias('PSPath')]
[string]$LiteralPath,
[Parameter(Mandatory = $true, Position = 1)]
[string]$Destination
)
# Create destination file if it doesn't exist
if (-not(Test-Path -LiteralPath $Destination -PathType Leaf)) {
$outFile = New-Item -Path $Destination -ItemType File
}
else {
$outFile = Get-Item -LiteralPath $Destination
}
[void]$PSBoundParameters.Remove('Destination')
try {
# Open a writable file stream to the output file
$outStream = $outFile.OpenWrite()
# Wrap output file stream in a CryptoStream.
#
# Anything that we write to the crypto stream is automatically
# base64-encoded and then written through to the output file stream
$transform = [System.Security.Cryptography.ToBase64Transform]::new()
$cryptoStream = [System.Security.Cryptography.CryptoStream]::new($outStream, $transform, 'Write')
foreach ($file in Get-Item @PSBoundParameters) {
try {
# Open readable input file stream
$inStream = $file.OpenRead()
# Copy input bytes to crypto stream
# - which in turn base64-encodes and writes to output file
$inStream.CopyTo($cryptoStream)
}
finally {
# Clean up the input file stream
$inStream | ForEach-Object Dispose
}
}
}
finally {
# Clean up the output streams
$transform, $cryptoStream, $outStream | ForEach-Object Dispose
}
}
Now you can do:
$inputPath = "C:\Users\my.user\myfile.pdf"
New-Base64File $inputPath -Destination "C:\Users\my.user\myfile.pdf.via_ps.base64"
And expect an output the same size as with base64