0

I am reading data from a text file in python using pandas. There are no header values (column names) assigned to the data in the text file. I want to reshape the data into a readable form. The problem i am facing is variable column lengths For example in my text file i have

    1,2,3,4,5,Hello,7,8
    1,2,3,4,5,7,8,
    1,2,3,4,5,7,8,
    1,2,3,4,5,Hello,7,8,

Now when i create a data frame I want to make sure that in the second row instead of Hello a "NAN" is written as the value for that column is not present. and in the end after giving column names and rearranging the data frame will look like.

    1,2,3,4,5,Hello,7,8
    1,2,3,4,5,"NA,"7,8,
    1,2,3,4,5,"NA",7,8,
    1,2,3,4,5,Hello,7,8,
7
  • are missing values present only in second column? Commented Dec 20, 2018 at 10:54
  • Which column should C in the 2nd row be written to? Commented Dec 20, 2018 at 10:56
  • how are these missing values represented in your text file? is there a whitespace, a tab... or how would you know which column the value originally belongs to, if it is written like your example? Commented Dec 20, 2018 at 11:05
  • @meW YES only in second Commented Dec 20, 2018 at 11:35
  • @MayankPorwal in this case third column Commented Dec 20, 2018 at 11:35

1 Answer 1

1

Answer to updated question and also a generalized solution for such case.

focus_col_idx = 5   # The column where you want to bring NaN in expected output
last_idx = df.shape[1] - 1

# Fetching the index of rows which have None in last column 
idx = df[df[last_idx].isnull()].index

# Shifting the column values for those rows with index idx
df.iloc[idx,focus_col_idx+1:] = df.iloc[idx,focus_col_idx:last_idx].values

# Putting NaN for second column where row index is idx
df.iloc[idx,focus_col_idx] = np.NaN

df


+---+----+---+---+---+---+-------+---+-----+
|   |  0 | 1 | 2 | 3 | 4 |   5   | 6 |  7  |
+---+----+---+---+---+---+-------+---+-----+
| 0 |  1 | 2 | 3 | 4 | 5 | Hello | 7 | 8.0 |
| 1 |  1 | 2 | 3 | 4 | 5 | NaN   | 7 | 8.0 |
| 2 |  1 | 2 | 3 | 4 | 5 | NaN   | 7 | 8.0 |
| 3 |  1 | 2 | 3 | 4 | 5 | Hello | 7 | 8.0 |
+---+----+---+---+---+---+-------+---+-----+

Answer to previous data

Assuming only one column is having missing value (say 2nd column as per your previous data). Here's a quick sol -

df = pd.read_table('SO.txt',sep='\,', header=None)
df

+---+---+---+---+---+------+
|   | 0 | 1 | 2 | 3 |  4   |
+---+---+---+---+---+------+
| 0 | A | B | C | D | E    |
| 1 | A | C | D | E | None |
+---+---+---+---+---+------+


# Fetching the index of rows which have None in last column 
idx = df[df[4].isnull()].index
idx
# Int64Index([1], dtype='int64')

# Shifting the column values for those rows with index idx
df.iloc[idx,2:] = df.iloc[idx,1:4].values
df

+---+---+---+---+---+---+
|   | 0 | 1 | 2 | 3 | 4 |
+---+---+---+---+---+---+
| 0 | A | B | C | D | E |
| 1 | A | C | C | D | E |        # <- Notice the shifting.
+---+---+---+---+---+---+


# Putting NaN for second column where row index is idx
df.iloc[idx,1] = np.NaN

# Final output
df
+---+---+-----+---+---+---+
|   | 0 |  1  | 2 | 3 | 4 |
+---+---+-----+---+---+---+
| 0 | A | B   | C | D | E |
| 1 | A | NaN | C | D | E |
+---+---+-----+---+---+---+
Sign up to request clarification or add additional context in comments.

5 Comments

Let me provide comments.
check now. If anything unclear let me know. Or if you want answer for updated question!
sure. I'll give you a generalized solution by that time.
its giving me an error "Must have equal len keys and value when setting with an ndarray"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.