1

I have a single text file that contains 60K+ lines in it. Those 60K+ lines are actually around 50 or so programs written in Natural. I need to break them apart into individual programs. I have a script that works perfectly with a single flaw. The naming of the output files.

Every program starts with "Module Name=", followed by the actual name of the program. I need to split the programs and save them using the actual program names.

Using the example below, I would like to create two files called Program1.txt and Program2.txt each containing the lines belonging to them. I have a script, also below, that separates the files correctly, but I am unable to discern the correct way to capture the Program name and use that as the name of the output file.

Example:

Module Name=Program1
....
....
....
END

Module Name=Program2
....
....
....
END

Code:

$InputFile = "C:\Natural.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$a = 1
While (($Line = $Reader.ReadLine()) -ne $null) {
    If ($Line -match "Module Name=") {
        $OutputFile = "MySplittedFileNumber$a.txt"
        $a++
    }    
    Add-Content $OutputFile $Line
}
2
  • 1
    I commend to your attention Microsoft Docs on -Split and -Join. Commented May 26, 2021 at 18:45
  • @JeffZeitlin I am attempting to alter the code using the -Split command. I will post success or failure. -Ron Commented May 26, 2021 at 19:57

2 Answers 2

3

Combine a switch statement, which can read files line by line efficiently with -File and can match each line against regex(es) with -Regex, and use a System.IO.StreamWriter instance to write the output files efficiently:

$outStream = $null

switch -Regex -File C:\Natural.txt {
  '\bModule Name=(\w+)' {   # a module start line
    if ($outStream) { $outStream.Close() }
    $programName = $Matches[1] # Extract the program name.
    # Create a new output file.
    # Important: use a *full* path.
    $outStream = [System.IO.StreamWriter] "C:\$programName.txt"
    # Write the line at hand.
    $outStream.WriteLine($_)
  }
  default {                 # all other lines
    # Write the line at hand to the current output file.
    $outStream.WriteLine($_)    
  }
}
if ($outStream) { $outStream.Close() }

Note:

  • The code assumes that the very first line in the input file is a Module Name=... line.

  • The regex matching is case-insensitive by default, as PowerShell generally is; add -CaseSensitive, if needed.

  • The automatic $Matches variable is used to extract the program name from the matching result.

Sign up to request clarification or add additional context in comments.

Comments

0

Thank you Jeff!

Here is my solution using the Split Command

$InputFile = "C:\Temp\EMNCP\Natural.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)

$OPName = @()
While (($Line = $Reader.ReadLine()) -ne $null) {
    If ($Line -match "Module Name=") {
        $OPName = $Line.Split("=")
        $FileName = $OPName[1].Trim()
        Write-Host "Found ... $FileName" -foregroundcolor green
        $OutputFile = "$FileName.txt"

    }    
    Add-Content $OutputFile $Line
}

3 Comments

Nice; a few tips: $OPName = @() initializes $OPName as an array, even though you want to it to be a string, but you actually don't need to initialize it at all. (The only way to lock in a type would be to type-constrain the assignment: [string] $OPName = '')
It's better to close / dispose of the stream reader explicitly ($Reader.Close()).
While using Add-Content in a loop works, it is quite slow, because the output file must be opened and closed for every call; hence the use of a [System.IO.StreamWriter] in my solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.