3

This is the first line of my txt.file

0.112296E+02-.121994E-010.158164E-030.158164E-030.000000E+000.340000E+030.328301E-010.000000E+00

There should be 8 columns, sometimes separated with '-', sometimes with '.'. It's very confusing, I just have to work with the file, I didn't generate it.

And second question: How can I work with the different columns? There is no header, so maybe:

df.iloc[:,0] .. ?

4
  • 2
    there is no csv writer that would produce a document like that. so it would have to be typed in. Also, it does not look like the - and . are separators. can it not be that - is just a negative number and . the decimal? in that case, there would be no separator rather a certain, fixed width. In your case 12 digits long (1.12E+01 , -1.22E-02 , 1.58E-04 , 1.58E-04 , 0.00E+00 , 3.40E+02 , 3.28E-02 , 0.00E+00) Commented Sep 14, 2016 at 8:32
  • it's not generated from a csv writer. It's from an Ansys code I didn't write. yes, '-' is the negativ exponent for e and '.' is the decimal. How can I do it, regardlessly? Commented Sep 14, 2016 at 8:39
  • That line seems to be bad enough for a human to parse! The whole line is data; there are no definite boundaries or delimiters defined. Commented Sep 14, 2016 at 8:44
  • Yes I know, but I need to work with it and have no idea how? Commented Sep 14, 2016 at 8:47

3 Answers 3

4

As stated in comments, this is likely a list of numbers in scientific notation, that aren't separated by anything but simply glued together. It could be interpreted as:

0.112296E+02
-.121994E-010
.158164E-030
.158164E-030
.000000E+000
.340000E+030
.328301E-010
.000000E+00

or as

0.112296E+02
-.121994E-01
0.158164E-03
0.158164E-03
0.000000E+00
0.340000E+03
0.328301E-01
0.000000E+00

Assuming the second interpretation is better, the trick is to split evenly every 12 characters.

data = [line[i:i+12] for i in range(0, len(line), 12)]

If really the first interpretation is better, then I'd use a REGEX

import re
line = '0.112296E+02-.121994E-010.158164E-030.158164E-030.000000E+000.340000E+030.328301E-010.000000E+00'
pattern = '[+-]?\d??\.\d+E[+-]\d+'
data = re.findall(pattern, line)

Edit

Obviously, you'd need to iterate over each line in the file, and add it to your dataframe. This is a rather inefficient thing to do in Pandas. Therefore, if your preferred interpretation is the fixed width one, I'd go with @Ev. Kounis ' answer: df = pd.read_fwf(myfile, widths=[12]*8)

Otherwise, the inefficient way is:

df = pd.DataFrame(columns=range(8))
with open(myfile, 'r') as f_in:
    for i, lines in enumerate(f_in):
        data = re.findall(pattern, line)
        df.loc[i] = [float(d) for d in data]

The two things to notice here is that the DataFrame must be initialized with column names (here [0, 1, 2, 3..7] but perhaps you know of better identifiers); and that the regex gave us strings that must be casted to floats.

Sign up to request clarification or add additional context in comments.

4 Comments

after the E comes the sign and after that only 2 digits. Thisi is important since '-.121994E-010' and '-.121994E-01' don't convert to the same float. So yes, the second interpretation is better but you are printing the first, right?
You can use \d{2} instead of \d+ at the end to make the regex capture the second form.
The first solution seems to work pretty great. But how can I do it if my file contains more than one row? I need the solution column-wise
I extended my answer, but this starts to be outside the scope of the initial question :-) (search for other QA)
3

As i said in the comments, it is not a case of multiple separators, it is just a fixed width format. Pandas has a method to read such files. try this:

df = pd.read_fwf(myfile, widths=[12]*8)
print(df)  # prints -> [0.112296E+02, -.121994E-01, 0.158164E-03, 0.158164E-03.1, 0.000000E+00, 0.340000E+03, 0.328301E-01, 0.000000E+00.1]

for the widths you have to provide the cell width which looks like its 12 and the number of columns which as you say must be 8.

As you might notice the results of the read are not perfect (notice the .1 just before the comma in the 4th and last element) but i am working on it.


Alternatively, you can do it "manually" like so:

myfile = r'C:\Users\user\Desktop\PythonScripts\a_file.csv'
width = 12
my_content = []
with open(myfile, 'r') as f_in:
    for lines in f_in:
        data = [float(lines[i * width:(i + 1) * width]) for i in range(len(lines) // width)]
        my_content.append(data)
print(my_content)  # prints -> [[11.2296, -0.0121994, 0.000158164, 0.000158164, 0.0, 340.0, 0.0328301, 0.0]]

and every row would be a nested list.

Comments

1

A possible solution is the following:

row = '0.112296E+02-.121994E-010.158164E-030.158164E-030.000000E+000.340000E+030.328301E-010.000000E+00'
chunckLen = 12
for i in range(0, len(row), chunckLen):
    print(row[0+i:chunckLen+i])

You can easly extend the code to handle more general cases.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.