0

I am trying to split a text file based on several strings into two files using Powershell. The file sizes rage from 5KB-15KB.

The file data is formatted for example below:

18600 - ABCD 2204 2020-04-11 00:00:00

18600 - ABCD 2204 2020-04-11 00:00:00

18600 - ABCD 2204 2020-04-11 00:00:00

18113 - ABCD 2204 2020-04-11 00:00:00

18113 - ABCD 2204 2020-04-11 00:00:00

19873 - ABCD 2204 2020-04-11 00:00:00

18764 - ABCD 2204 2020-04-11 00:00:00

19000 - ABCD 2204 2020-04-11 00:00:00

I need to split all rows that begin with 18600, 18113, 19000, etc. (or any set of specified 5 digits) into one file and all remaining lines of data that do not begin with those numbers (else) into a second file.

So the logic is, For each line in the file if it begins with these sets of specified numbers, write to "file1" else write it to "file2".

$file = (Get-Content myfile.txt)
ForEach ($line in $file) {
  If ($line -match a set of strings) 
{
$newfile = all lines with set of beginning strings
}
Else {
$line | Out-File -Append different file
}    
}

I'm open to any other other suggestions outside of powershell also. Thank you so much for your help.

5
  • 1
    so ... you want all lines that start with 18 in sent to one file and anything else sent to another? Commented Jul 10, 2020 at 16:53
  • Well all the lines that begin with the full string of numbers, not just the '18' as some strings with 18 will need to go to the second file. Commented Jul 10, 2020 at 17:04
  • so ... how do you determine what lines to send where? you have not specified that completely ... Commented Jul 10, 2020 at 17:07
  • The determinate is based on the first 5 numbers. All those with "this" group of numbers should be in "this" file. All reaming lines (else) that begin with any number outside of those should be written to a separate file. Thank you for your response below. Commented Jul 10, 2020 at 18:44
  • kool! [grin] that means a range would work ... and that is how i set up the Answer i posted. in PoSh a range can be non-contiguous, so 1..88, 333..400 would be a valid range. Commented Jul 10, 2020 at 22:35

2 Answers 2

1

presuming that you want all the lines that start with a number in the 18000..18999 range, this does the job ... [grin]

what it does ...

  • set the constants
  • creates a file to work with
    when ready to do this with your data, replace the entire #region/#endregion block with a call to Get-Content.
  • loads the input file
  • iterates thru that collection
  • splits the current line to get the part before the 1st space
  • converts that to an [int]
  • checks to see if it is in the desired range
  • if YES, sends it to the 18 file
  • if NO, sends it to the not-18 file

this code ...

  • lacks any significant error handling
  • does not keep track of what was done
  • does not show what is going on

the code ...

$SourceDir = "$env:TEMP\WBCha"
$TargetNumberRange = 18000..18999
$InFile = Join-Path -Path $SourceDir -ChildPath 'InFile.txt'
$18OutFile = Join-Path -Path $SourceDir -ChildPath '18_OutFile.txt'
$Not_18OutFile = Join-Path -Path $SourceDir -ChildPath 'Not_18OutFile.txt'

#region >>> create a file to work with
#    when ready to do this for real, replace the whole "region" block with a Get-Contnet call
if (-not (Test-Path -LiteralPath $SourceDir))
    {
    $Null = New-Item -Path $SourceDir -ItemType 'Directory' -ErrorAction 'SilentlyContinue'
    }
$HowManyLines = 1e1
$Content = foreach ($Line in 0..$HowManyLines)
    {
    $Prefix = @(18,19)[(Get-Random -InputObject @(0, 1))]
    '{0}{1:d3} - {2}' -f $Prefix, $Line, [datetime]::Now.ToString('yyyyy-MM-dd HH:mm:ss:ffff')
    }
$Content |
    Set-Content -LiteralPath $InFile -ErrorAction 'SilentlyContinue'
#endregion >>> create a file to work with


foreach ($IF_Item in (Get-Content -LiteralPath $InFile))
    {
    if ([int]$IF_Item.Split(' ')[0] -in $TargetNumberRange)
        {
        Add-Content -LiteralPath $18OutFile -Value $IF_Item
        }
        else
        {
        Add-Content -LiteralPath $Not_18OutFile -Value $IF_Item
        }
    }

the 18 file content ...

18000 - 02020-07-10 12:29:45:6736
18001 - 02020-07-10 12:29:45:6736
18004 - 02020-07-10 12:29:45:6746
18005 - 02020-07-10 12:29:45:6756
18006 - 02020-07-10 12:29:45:6756
18008 - 02020-07-10 12:29:45:6766
18010 - 02020-07-10 12:29:45:6766

the not 18 file content ...

19002 - 02020-07-10 12:29:45:6746
19003 - 02020-07-10 12:29:45:6746
19007 - 02020-07-10 12:29:45:6756
19009 - 02020-07-10 12:29:45:6766
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much.
@WBCha - you are most welcome! glad to help a little now and then ... [grin]
0

Assuming that you want to separate the rows that start with numbers to one file, and the ones not starting with numbers to other file, you can use -match operator and pass a Regex to scan all the rows in your text file and separate the ones starting with digits.

The code snippet goes something like this:

$processText = $fileData.Split([Environment]::NewLine,[StringSplitOptions]::RemoveEmptyEntries)
{
     if($row -match "\d") #Regex to check whether the first character of $row is a digit
     {
         $row | Out-File -FilePath "D:\DataStartingWithNum.text"
     }
     else
     {
         $row | Out-File -FilePath "D:\DataStartingWithText.text"
     }
}

If you have any other condition as well (which you might have missed explaining in your question above), you can use similar way to filter out any pattern of initial data using suitable Regex with -match operator.

Hope this helps.

2 Comments

Thank you for your response, this was helpful.
@WBCha - did this answer helped you solve your issue? If so, please upvote the answer & mark it as as accepted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.