Reading data from text file with variable numbers of Column

Question

I am reading data from a text file in python using pandas. There are no header values (column names) assigned to the data in the text file. I want to reshape the data into a readable form. The problem i am facing is variable column lengths For example in my text file i have

    1,2,3,4,5,Hello,7,8
    1,2,3,4,5,7,8,
    1,2,3,4,5,7,8,
    1,2,3,4,5,Hello,7,8,

Now when i create a data frame I want to make sure that in the second row instead of Hello a "NAN" is written as the value for that column is not present. and in the end after giving column names and rearranging the data frame will look like.

    1,2,3,4,5,Hello,7,8
    1,2,3,4,5,"NA,"7,8,
    1,2,3,4,5,"NA",7,8,
    1,2,3,4,5,Hello,7,8,

how are these missing values represented in your text file? is there a whitespace, a tab... or how would you know which column the value originally belongs to, if it is written like your example? — Flob
– Flob, Commented Dec 20, 2018 at 11:05

meW · Accepted Answer · 2018-12-20 12:24:21Z

1

Answer to updated question and also a generalized solution for such case.

focus_col_idx = 5   # The column where you want to bring NaN in expected output
last_idx = df.shape[1] - 1

# Fetching the index of rows which have None in last column 
idx = df[df[last_idx].isnull()].index

# Shifting the column values for those rows with index idx
df.iloc[idx,focus_col_idx+1:] = df.iloc[idx,focus_col_idx:last_idx].values

# Putting NaN for second column where row index is idx
df.iloc[idx,focus_col_idx] = np.NaN

df


+---+----+---+---+---+---+-------+---+-----+
|   |  0 | 1 | 2 | 3 | 4 |   5   | 6 |  7  |
+---+----+---+---+---+---+-------+---+-----+
| 0 |  1 | 2 | 3 | 4 | 5 | Hello | 7 | 8.0 |
| 1 |  1 | 2 | 3 | 4 | 5 | NaN   | 7 | 8.0 |
| 2 |  1 | 2 | 3 | 4 | 5 | NaN   | 7 | 8.0 |
| 3 |  1 | 2 | 3 | 4 | 5 | Hello | 7 | 8.0 |
+---+----+---+---+---+---+-------+---+-----+

Answer to previous data

Assuming only one column is having missing value (say 2nd column as per your previous data). Here's a quick sol -

df = pd.read_table('SO.txt',sep='\,', header=None)
df

+---+---+---+---+---+------+
|   | 0 | 1 | 2 | 3 |  4   |
+---+---+---+---+---+------+
| 0 | A | B | C | D | E    |
| 1 | A | C | D | E | None |
+---+---+---+---+---+------+


# Fetching the index of rows which have None in last column 
idx = df[df[4].isnull()].index
idx
# Int64Index([1], dtype='int64')

# Shifting the column values for those rows with index idx
df.iloc[idx,2:] = df.iloc[idx,1:4].values
df

+---+---+---+---+---+---+
|   | 0 | 1 | 2 | 3 | 4 |
+---+---+---+---+---+---+
| 0 | A | B | C | D | E |
| 1 | A | C | C | D | E |        # <- Notice the shifting.
+---+---+---+---+---+---+


# Putting NaN for second column where row index is idx
df.iloc[idx,1] = np.NaN

# Final output
df
+---+---+-----+---+---+---+
|   | 0 |  1  | 2 | 3 | 4 |
+---+---+-----+---+---+---+
| 0 | A | B   | C | D | E |
| 1 | A | NaN | C | D | E |
+---+---+-----+---+---+---+

edited Dec 20, 2018 at 12:24

answered Dec 20, 2018 at 11:46

meW

3,97710 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

meW Over a year ago

Let me provide comments.

meW Over a year ago

check now. If anything unclear let me know. Or if you want answer for updated question!

meW Over a year ago

sure. I'll give you a generalized solution by that time.

Jeff Over a year ago

its giving me an error "Must have equal len keys and value when setting with an ndarray"

meW Over a year ago

Let us continue this discussion in chat.

Collectives™ on Stack Overflow

Reading data from text file with variable numbers of Column

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related