introduce parsing loop / refactor ugly code

Question

I am writing a script that reads from a binary file, converts to ASCII, extracts/delimits 2 columns, and pipes it out to a txt.

I looked at this post to implement the binary > ASCII step, but, in the way that it is implemented in my script, it seems to only perform the above process on the first row in the file.

How would I re-write this to loop through all rows in the file?

My code is below.

# run the command script to extract the file
script.cmd

# Read the entire file to an array of bytes.
$bytes = [System.IO.File]::ReadAllBytes("filePath")

# Decode first 'n' number of bytes to a text assuming ASCII encoding.
$text = [System.Text.Encoding]::ASCII.GetString($bytes, 0, 999999)|

    # only keep columns 0-22; 148-149; separate with comma delimiter
    %{ "$($_[$0..22] -join ''),$($_[147..147]  -join '')"} |

    # convert the file to .txt
    set-content path\file.txt

Also, what is a more elegant way of writing this part so it just reads the length of the string, instead of pulling in up to 999999 bytes?

$text = [System.Text.Encoding]::ASCII.GetString($bytes, 0, 999999)|

If you're assuming it's ASCII encoded data, why do you want to read it as bytes, then convert to text rather than doing ReadAllText to start with? — mjolinor
– mjolinor, Commented Nov 4, 2013 at 15:49
@mjolinor - the file starts as binary, then gets converted to ASCII. i am doing this because i want to prevent data corruption when we are pulling the file from FTP in the 1st step in the script. — sion_corn
– sion_corn, Commented Nov 4, 2013 at 15:53
Normally I'd expect to see a newline (13 10) between each row for ASCII data. You'll need to look at your data. — mjolinor
– mjolinor, Commented Nov 4, 2013 at 16:22
The 13 and 10 are the binary codes for cr and lf (newline). You won't see it after it's converted to ASCII, only in the binary. — mjolinor
– mjolinor, Commented Nov 4, 2013 at 16:40

Frode F. · Accepted Answer · 2013-11-04 16:42:17Z

1

You don't need to specify index and count. Simply use

[System.Text.Encoding]::ASCII.GetString($bytes).Split("`r`n",[System.StringSplitOptions]::RemoveEmptyEntries)

or

[System.Text.Encoding]::ASCII.GetString([System.IO.File]::ReadAllBytes("filePath")).Split("`r`n",[System.StringSplitOptions]::RemoveEmptyEntries)

I'm not sure why you would want to read it as bytes, when you could simply use Get-Content.

edited Nov 4, 2013 at 16:42

answered Nov 4, 2013 at 15:31

Frode F.

55.4k9 gold badges104 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

sion_corn Over a year ago

Your 2nd suggestion worked (thank you), but I still need to loop through all rows in the file. Currently only the 1st row is being parsed.

Frode F. Over a year ago

Didn't see the first question. Since you're extracting text based on character index, you need to loop using foreach/(%). You could use regex instead of character index, but it won't make much difference unless the file is so big that it takes minutes to process. Looping through lines ain't bad. The alternative is reading all the text and then parsing(line by line) which could be slower.

sion_corn Over a year ago

thank you for the suggestion. I am very new to the posh game, so i don't know how to implement the loop. I'm still looking around online for how to implement, but if you could show me, I'd geratly appreciate it.

Frode F. Over a year ago

You're already looping the lines like I would using the % { } which is the same as foreach-object { }.

Frode F. Over a year ago

Oh. That's because GetString() returns a single string. I forgot about that. You need to split it into lines before looping it. How the linebreaks are done depends on how the file was created. Try to method above. :)

|

Collectives™ on Stack Overflow

introduce parsing loop / refactor ugly code

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related