2

I have a Data Frame df and I want to add '/' this in cast and genres column So that each cell contain 3 '/'

id  movie      cast      genres  runtime
1   Furious    a/b/c/d   a/b        23
2   Minions    a/b/c     a/b/c      55
3   Mission    a/b       a          67
4   Kingsman   a/b/c/d   a/b/c/d    23
5   Star Wars  a         a/b/c      45

So, that its output looks like this

id  movie      cast      genres  runtime
1   Furious    a/b/c/d   a/b//      23
2   Minions    a/b/c/    a/b/c/     55
3   Mission    a/b//     a///       67
4   Kingsman   a/b/c/d   a/b/c/d    23
5   Star Wars  a///      a/b/c/     45
2
  • Share the code you've written and explained what's wrong with that code. That shows your effort. Commented Jul 8, 2019 at 13:07
  • This looks like Assignment/Homework question. you should try yourself first then ask when you get stuck. Commented Jul 8, 2019 at 13:09

6 Answers 6

1

Here's one approach defining a custom function:

def add_values(df, *cols):
    for col in cols:
        # amount of "/" to add at each row
        c = df[col].str.count('/').rsub(3)
        # translate the above to as many "/" as required
        ap = [i * '/' for i in c.tolist()]
        # Add the above to the corresponding column
        df[col] = [i + j for i,j in zip(df[col], ap)]
    return df

add_values(df, 'cast', 'genres')

   id     movie     cast   genres  runtime
0   1   Furious  a/b/c/d    a/b//       23
1   2   Minions   a/b/c/   a/b/c/       55
2   3   Mission    a/b//     a///       67
3   4  Kingsman  a/b/c/d  a/b/c/d       23
4   5  StarWars     a///   a/b/c/       45
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this is the perfect solution which I want.
0

you can split by /, fill the resulting list with empty strings until it is of size 4, and then join with / again.

use .apply to change the values in the entire column.

try this:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""id  movie      cast      genres  runtime
1   Furious    a/b/c/d   a/b        23
2   Minions    a/b/c     a/b/c      55
3   Mission    a/b       a          67
4   Kingsman   a/b/c/d   a/b/c/d    23
5   Star Wars  a         a/b/c      45"""), sep=r"\s\s+")


def pad_cells(value):
    parts = value.split("/")
    parts += [""] * (4 - len(parts))
    return "/".join(parts)


df["cast"] = df["cast"].apply(pad_cells)
df["genres"] = df["genres"].apply(pad_cells)

print(df)

Comments

0

Use this function on each element in each column to update them.

def update_string(string):
    total_occ = 3 #total no. of occurrences of character '/' 
    for element in string: # for each element,
        if element == "/": # if there is '/', decrease 'total_occ'
            total_occ=total_occ-1;
    for i in range(total_occ): # add remaining no. of '/' at the end
        string+="/"
    return string

x = "a/b"    
print(update_string(x))

Output is:

a/b//

Comments

0

Here You go:

=^..^=

import pandas as pd
from io import StringIO

# create raw data
raw_data = StringIO("""
id movie cast genres runtime
1 Furious a/b/c/d a/b 23
2 Minions a/b/c a/b/c 55
3 Mission a/b a 67
4 Kingsman a/b/c/d a/b/c/d 23
5 Star_Wars a a/b/c 45
""")

# load data into data frame
df = pd.read_csv(raw_data, sep=' ')

# iterate over rows and add character
for index, row in df.iterrows():
    count_character_cast = row['cast'].count('/')
    if count_character_cast < 3:
        df.set_value(index, 'cast', row['cast']+'/'*(3-int(count_character_cast)))

    count_character_genres = row['genres'].count('/')
    if count_character_genres < 3:
        df.set_value(index, 'genres', row['genres'] + '/' * (3 - int(count_character_genres)))

Output:

   id      movie     cast   genres  runtime
0   1    Furious  a/b/c/d    a/b//       23
1   2    Minions   a/b/c/   a/b/c/       55
2   3    Mission    a/b//     a///       67
3   4   Kingsman  a/b/c/d  a/b/c/d       23
4   5  Star_Wars     a///   a/b/c/       45

Comments

0

Short solution with itertools features and Dataframe.applymap function:

In [217]: df
Out[217]: 
   id      movie     cast   genres  runtime
0   1    Furious  a/b/c/d      a/b       23
1   2    Minions    a/b/c    a/b/c       55
2   3    Mission      a/b        a       67
3   4   Kingsman  a/b/c/d  a/b/c/d       23
4   5  Star Wars        a    a/b/c       45

In [218]: from itertools import chain, zip_longest

In [219]: def ensure_slashes(x):
     ...:     return ''.join(chain.from_iterable(zip_longest(x.split('/'), '///', fillvalue='')))
     ...: 
     ...: 

In [220]: df[['cast','genres']] = df[['cast','genres']].applymap(ensure_slashes)

In [221]: df
Out[221]: 
   id      movie     cast   genres  runtime
0   1    Furious  a/b/c/d    a/b//       23
1   2    Minions   a/b/c/   a/b/c/       55
2   3    Mission    a/b//     a///       67
3   4   Kingsman  a/b/c/d  a/b/c/d       23
4   5  Star Wars     a///   a/b/c/       45

The crucial function to apply is:

def ensure_slashes(x):
    return ''.join(chain.from_iterable(zip_longest(x.split('/'), '///', fillvalue='')))

Comments

0

Ok, so the idea is to create a function that do the necessary work and apply it to the wanted columns :

The function will substitute the current slashs with empty strings and creates a zip of the string within the cell and a constant slash list with exactly 3 elements.

The result is the concatination of the elements of this zip and Hoppla it works :)

import pandas as pd
import re 
df = pd.DataFrame({
                    'id': [1, 2, 3, 4, 5], 
                    'movie': ['furious', 'Mininons', 'mission', 'Kingsman', 'star Wars'], 
                    'cast': ['a/b/c/d', 'a/b/c', 'a/b', 'a/b/c/d', 'a'], 
                    'genres': ['a/b', 'a/b/c', 'a', 'a/b/c/d', 'a/b/c'],
                    'runtime': [23, 55, 67, 23, 45], 
                    })

def slash_func(x):
    slash_list = ['/'] * 3
    x = re.sub('/', '', str(x))
    list_ = list(x)

    for i in range(3 - len(list_)): 
        list_.append('')
    output_list = [v[0]+v[1] for v in list(zip(list_, slash_list))]

    return ''.join(output_list) 


df['cast'] = df['cast'].apply(lambda x: slash_func(x))
df['genres'] = df['genres'].apply(lambda x: slash_func(x))

Output :

id  movie       cast    genres  runtime
1   furious     a/b/c/  a/b//   23
2   Mininons    a/b/c/  a/b/c/  55
3   mission     a/b//   a///    67
4   Kingsman    a/b/c/  a/b/c/  23
5   star Wars   a///    a/b/c/  45

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.