Python/Pandas - split text into columns by delimiter ; and create a csv file

Question

I have a long text where I have inserted a delimiter ";" exactly where I would like to split the text into different columns. So far, whenever I try to split the text into 'ID' and 'ADText' I only get the first line. However there should be 1439 lines/rows in two columns.

My text looks like this: 1234; text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon 2345; then the new Ad-Text begins until the next ID 3456; and so on

I want to use the ; to split my text into two Columns, one with ID and one with the AD Text.

#read the text file into python: 
jobads= pd.read_csv("jobads.txt", header=None)
print(jobadsads)

#create dataframe 
df=pd.DataFrame(jobads, index=None, columns=None)
type(df)
print(df)
#name column to target it for split 
df = df.rename(columns={0:"Job"})
print(df)

#split it into two columns. Problem: I only get the first row.
print(pd.DataFrame(dr.Job.str.split(';',1).tolist(),
                   columns=['ID','AD']))

Unfortunately that only works for the first entry and then it stops. The output looks like this:

               ID                                                 AD
0            1234                                   text in written from with ...

Where am I going wrong? I would appreciate any advise =) Thank you!

Why don't you use the "sep" attribute of "pd.read_csv" ?

Amuoeba
– Amuoeba

2020-09-04 14:22:53 +00:00
Commented Sep 4, 2020 at 14:22 — Amuoeba
– Amuoeba, Commented Sep 4, 2020 at 14:22

Amuoeba · Accepted Answer · 2020-09-14 10:29:01Z

2

sample text:

FullName;ISO3;ISO1;molecular_weight
Alanine;Ala;A;89.09
Arginine;Arg;R;174.20
Asparagine;Asn;N;132.12
Aspartic_Acid;Asp;D;133.10
Cysteine;Cys;C;121.16

Create columns based on ";" separator:

import pandas as pd
f = "aminoacids"
df = pd.read_csv(f,sep=";")

EDIT: Considering the comment I assume the text looks more something like this:

t = """1234; text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon 2345; then the new Ad-Text begins until the next ID 3456; and so on1234; text in written from with multiple """

In this case regex like this will split your string into ids and text which you can then use to generate a pandas dataframe.

import re
r = re.compile("([0-9]+);")
re.split(r,t)

Output:

['',
 '1234',
 ' text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon ',
 '2345',
 ' then the new Ad-Text begins until the next ID ',
 '3456',
 ' and so on',
 '1234',
 ' text in written from with multiple ']

EDIT 2: This is a response to questioners additional question in the comments: How to convert this string to a pandas dataframe with 2 columns: IDs and Texts

import pandas as pd
# a is the output list from the previous part of this answer
# Create list of texts. ::2 takes every other item from a list, starting with the FIRST one.
texts = a[::2][1:] 
print(texts)
# Create list of ID's. ::1 takes every other item from a list, starting with the SECOND one
ids = a[1::2]
print(ids)
df = pd.DataFrame({"IDs":ids,"Texts":texts})

edited Sep 14, 2020 at 10:29

answered Sep 4, 2020 at 14:31

Amuoeba

8462 gold badges12 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Nina Over a year ago

Thank you so much for your answer. I have tried this and it gives me 0 rows and 19090 columns since my text is not sorted like your example. I have the ID not written nicely in front of each line but free-flowing in the text.

Amuoeba Over a year ago

Ah I see so you don't even have new lines? It is just a long single line string?

Nina Over a year ago

Yes it is one single line string sadly

Amuoeba Over a year ago

@Nina Have you checked the edit of my answer? Would it help or are there any other scenarios that it doesen‘t capture?

Nina Over a year ago

Thank you so much for your time and answer! It helps a lot and it worked! Awesome really, thanks!

|

Collectives™ on Stack Overflow

Python/Pandas - split text into columns by delimiter ; and create a csv file

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related