0

I am writing a script that reads from a binary file, converts to ASCII, extracts/delimits 2 columns, and pipes it out to a txt.

I looked at this post to implement the binary > ASCII step, but, in the way that it is implemented in my script, it seems to only perform the above process on the first row in the file.

How would I re-write this to loop through all rows in the file?

My code is below.

# run the command script to extract the file
script.cmd

# Read the entire file to an array of bytes.
$bytes = [System.IO.File]::ReadAllBytes("filePath")

# Decode first 'n' number of bytes to a text assuming ASCII encoding.
$text = [System.Text.Encoding]::ASCII.GetString($bytes, 0, 999999)|

    # only keep columns 0-22; 148-149; separate with comma delimiter
    %{ "$($_[$0..22] -join ''),$($_[147..147]  -join '')"} |

    # convert the file to .txt
    set-content path\file.txt

Also, what is a more elegant way of writing this part so it just reads the length of the string, instead of pulling in up to 999999 bytes?

$text = [System.Text.Encoding]::ASCII.GetString($bytes, 0, 999999)|
7
  • 1
    If you're assuming it's ASCII encoded data, why do you want to read it as bytes, then convert to text rather than doing ReadAllText to start with? Commented Nov 4, 2013 at 15:49
  • @mjolinor - the file starts as binary, then gets converted to ASCII. i am doing this because i want to prevent data corruption when we are pulling the file from FTP in the 1st step in the script. Commented Nov 4, 2013 at 15:53
  • 1
    Since it's binary data, how are the rows delimited? Commented Nov 4, 2013 at 16:03
  • 1
    Normally I'd expect to see a newline (13 10) between each row for ASCII data. You'll need to look at your data. Commented Nov 4, 2013 at 16:22
  • 1
    The 13 and 10 are the binary codes for cr and lf (newline). You won't see it after it's converted to ASCII, only in the binary. Commented Nov 4, 2013 at 16:40

1 Answer 1

1

You don't need to specify index and count. Simply use

[System.Text.Encoding]::ASCII.GetString($bytes).Split("`r`n",[System.StringSplitOptions]::RemoveEmptyEntries)

or

[System.Text.Encoding]::ASCII.GetString([System.IO.File]::ReadAllBytes("filePath")).Split("`r`n",[System.StringSplitOptions]::RemoveEmptyEntries)

I'm not sure why you would want to read it as bytes, when you could simply use Get-Content.

Sign up to request clarification or add additional context in comments.

6 Comments

Your 2nd suggestion worked (thank you), but I still need to loop through all rows in the file. Currently only the 1st row is being parsed.
Didn't see the first question. Since you're extracting text based on character index, you need to loop using foreach/(%). You could use regex instead of character index, but it won't make much difference unless the file is so big that it takes minutes to process. Looping through lines ain't bad. The alternative is reading all the text and then parsing(line by line) which could be slower.
thank you for the suggestion. I am very new to the posh game, so i don't know how to implement the loop. I'm still looking around online for how to implement, but if you could show me, I'd geratly appreciate it.
You're already looping the lines like I would using the % { } which is the same as foreach-object { }.
Oh. That's because GetString() returns a single string. I forgot about that. You need to split it into lines before looping it. How the linebreaks are done depends on how the file was created. Try to method above. :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.