0

I am a complete novice to Python-or programming.

I have a text file to parse into a CSV. I am not able to provide an example of the text file at this time.

  1. The text is several (thousand) lines with no carriage returns.
  2. There are 4 types of records in the file (A, B, C, or I).
  3. Each record type has a specific format based on the size of the data element.
  4. There are no delimiters.
  5. Immediately after the last data element in the record type, the next record type appears.
  6. I have been trying to translate from a different language what this might look like in Python.

Here is an example of what I've written (not correct format)

file=open('TestPython.txt'), 'r' # from current working directory
dataString=file.read()
data=()
i=0
while i < len(dataString):
i = i+2
    curChar = dataString(i)
    # Need some help on the next line var curChar = dataString[i]

    if curChar = "A"
        NPI = dataString(i+1, 16) # Need to verify that is how it is done in python inside ()
            NPI.strip()
        PCN = datastring(i+17, 40)
            PCN.strip()
        seqNo = dataString(i+41, 42)
            seqNo.strip()
        MRN = dataString(i+43, 66)
            MRN.strip()
    if curChar = "B"
        NPI = dataString(i+1, 16) # Need to verify that is how it is done in python inside ()
            NPI.strip()
        PCN = datastring(i+17, 40)
            PCN.strip()
        seqNo = dataString(i+41, 42)
            seqNo.strip()
        RC1 = (i+43, 46)
            RC1.strip()
        RC2 = (i+47, 50)
            RC2.strip() 
        RC3 = (i+51, 54)
            RC3.strip()
    if curChar = "C"
        NPI = dataString(i+1, 16) # Need to verify that is how it is done in python inside ()
            NPI.strip()
        PCN = datastring(i+17, 40)
            PCN.strip()
        seqNo = dataString(i+41, 42)
            seqNo.strip()
        DXVer = (i=43, 43)
            DXVer.strip()
        AdmitDX = (i+44, 50)
            AdmitDX.strip()
        RVisit1 = (i+51, 57)
            RVisit1.strip()

Here's a Dummied-up version of a piece of the text file.

A 63489564696474677 9845687 777 67834717467764674 TUANU TINBUNIU 47 ERTYNU TDFGH UU748897764 66762589668777486U6764467467774767 7123609989 9 O
B 79466945684634677 676756787344786474634890 7746.66 7 96 4 7 7 9 7 774666 44969 494 7994 99666 77478 767766
B 098765477 64697666966667 9 99 87966 47798 797499
C 63489564696474677 6747494 7494 7497 4964 4976 N7469 4769 N9784 9677
I 79466944696474677 677769U6 8888 67764674
A 79466945684634677 6767994 777 696789989 6464467464764674 UIIUN UITTI 7747 NUU 9 ATU 4 UANU OSASDF NU67479 66567896667697487U6464467476777967 7699969978 7699969978 9 O

As you can see, there can be several of each type in the file. The way this example pastes, it looks like the type is the first character on a line. This is not the case on the actual file (i made this sample in Word).

5
  • 1
    You need to provide at least some kind of abstraction of the format or the question becomes unanswerable. Commented Jan 24, 2013 at 16:18
  • 1
    If I read this correctly, you start by pumping the entire file into a string. This is a bit wild, you should only read little bits into memory and process them. Commented Jan 24, 2013 at 16:22
  • You should possible try to use the python CSV module of python: docs.python.org/2/library/csv.html, which maybe allows you to read in the data in one line... Commented Jan 24, 2013 at 16:22
  • I am not so sure this is technically speaking a CSV file. Commented Jan 24, 2013 at 16:25
  • @flup: It's not a CSV file yet. It seems to be a stream of fixed-width datasets that he wants convert into a new CSV file. Commented Jan 24, 2013 at 16:29

2 Answers 2

2

You might take a look at pyparsing.

Sign up to request clarification or add additional context in comments.

1 Comment

True, but it is not exactly beginner stuff.
0

You better process the file as you read it.

First, do a file.read(1) to determine which type of record is up next.

Then, depending on the type, read the fields, which if I understand you correctly are fixed width. So for type 'A' this would look like this:

def processA (file):
    NPI = file.read(16).strip()  #assuming the NPI is 16 bytes long 
    PCN = file.read(23).strip()  #assuming the PCN is 23 bytes long
    seqNo = file.read(1).strip() #assuming seqNo is 1 byte long
    MRN = file.read(23).strip()  #assuming MRN is 23 bytes long
    return {"NPI":NPI,"PCN":PCN, "seqNo":seqNo, "MRN":MRN}

If the file is not ASCII, there's a bit more work to get the encoding right and read characters instead of bytes.

3 Comments

thanks. Yes, I was trying to turn the entire file into one string... Wild wasn't what I was going for. The def processA (file): looks like something to try.
When I do the file.read(1) I get an attribute error: 'tuple' object has no attribute 'read'. Is this the text file is not ASCII? Or do I need to hit the tutorials again (I will anyway).
No, there's a little typo when you open the file. It ought to read file=open('TestPython.txt', 'r'). Note the closing bracket. Your statement generates a tuple containing the open file and the 'r'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.