1

I have been struggling to successfully break apart contents of a text file and insert them into a .csv with the following rules:

  1. The line containing '>' should be inserted into .csv column 1
  2. The lines containing all caps should be inserted into .csv column 2 and each block of capital letters should be joined (have its `r or `n removed)
  3. '>' and '*' should be removed where present

Separately, I can get column 1 to work fairly well using:

$file = (Get-Content 'samplefile.txt')

$data = foreach ($line in $file) {
    if ($line -match '^>') {
            [pscustomobject]@{
            'Part1' = (Select-String '^>' -InputObject $line) -replace '>', ''
            }
}
}
$data | Out-File 'newfile.csv'

and limited success using similar for column 2 (I can't seem to get -join to work with `r or `n):

$file = (Get-Content 'samplefile.txt')

$data = foreach ($line in $file) {
    if ($line -match '^[A-Z].*') {
            [pscustomobject]@{
            'Part2' = (Select-String '^[A-Z].*' -InputObject $line) -replace '*', ''
            }
}
}
$data | Out-File 'newfile.csv'

But it escapes me how to get both to work in the same code block to iterate over each section delimited by '>' and/or '*'.

Below is a sample of the data for reference.

>9392290|2983921
FYUOIQWEFYUOIAGSNJJJHKEWAHJKTHJEWUYIYGUIOIOIUYAFUIOWUEYOUYIA
GDFOUYUIOAGHIHUAGSD
>lsm.VI.superconfig_5640.1|lsm.model.superconfig_5640.1
FDASJKLHJKLGAHJKDFGHJKAGJKHUIGAHIULGRUOUHWWUGUIOHZIOJSHIJMAW
DFSANJKLNJLWEQUIOGFDSOIYUBHPOGANUPPUNABNPUNUPAPNUNPUFSAPNUSS
FSADUHHULGWAUNUNWEANNIOEAWNUNIIIINNBSDNJLKNJKLAERGJKLHHJLKGS
DFSAQSAHUSDFAHOUHGROUGRWE*
>jfi.ZJ.superconfig_99.31|jfi.model.superconfig_99.31
ASDFUIOHPOASPNADPUNPNUSADFNUPPUOHZSABUHBAHPUDASPHAWHPOEWGHPI
GWANUEGWUNPNPEANUPUNPEAWUPOGDFPOAGIJJIEOAWIOAGPIOJSGNJHIOWEA
AUHNHIOEANPIASPNIOICBNIOASGIOEGWPIOWEPPPPSAJPOJKGPWEAIOJJPIO
FAWEIOPHGAHNIOPGWEOPPOEAWSPIOOPUIGSUIOGUIOPWAGIEOUIWEAOGUIOP
GEIOJHIOJPWEPJIOWGEIOPHGANIONIOGEWANIOEGWOPIHNNPIOEGWIJOWEAG
GEPUIEWUIOSZBHJENWNBENUEBMIPEWVMIEMUIAZWIPNBWEPEWIOJJKEAWPIA
GWEPHIOEWNPOEWANNNPIOGWREIJUOGUHIOSNJJJJJJJJKVMVIOIPEGIOEAUW
EGWIOJNENIOPIOWINPEAWNPOI*

1 Answer 1

3

I suggest using a -split operation:

(Get-Content -Raw samplefile.txt) -split '(?m)^>(.+)' -ne '' |
  ForEach-Object -Begin { $i = 0 } -Process {
    if (++$i % 2) {          # 1st, 3rd, ... result, i.e. the ">"-prefixed lines
      $part1 = $_            # Save for later.
    } else {                 # 2nd, 4th, ... result, i.e. the all-uppercase lines
      [pscustomobject] @{   # Construct and output a custom object.
        Part1 = $part1
        Part2 = $_ -replace '\r?\n|\*$' # Remove newlines and trailing "*"
      }
    }
  }  # pipe to Export-Csv as needed.

To-display output:

Part1                                                  Part2
-----                                                  -----
9392290|2983921                                        FYUOIQWEFYUOIAGSNJJJHKEWAHJKTHJEWUYIYGUIOIOIUYAFUIOWUEYOUYIAGDFOUYUIOAGHIHUAGSD
lsm.VI.superconfig_5640.1|lsm.model.superconfig_5640.1 FDASJKLHJKLGAHJKDFGHJKAGJKHUIGAHIULGRUOUHWWUGUIOHZIOJSHIJMAWDFSANJKLNJLWEQUIOGFDSOIYUBHPOGANUPPUNABNPUNU…
jfi.ZJ.superconfig_99.31|jfi.model.superconfig_99.31   ASDFUIOHPOASPNADPUNPNUSADFNUPPUOHZSABUHBAHPUDASPHAWHPOEWGHPIGWANUEGWUNPNPEANUPUNPEAWUPOGDFPOAGIJJIEOAWIO…
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.