Pandas: Dynamically add a row and columns and input values to it

Question

I am working with a large dataset that iteratively fetches n number of child URLs for a particular parent URL.

I initially used excel to record the data (test the working my code actually). But later found out that the idea is not worth it as the output data were huge.

for example: i have two set of data:

amazon.com: ['a','b','c','d','e']
a         : ['k','j','e','f']

Here in the first case, amazon.com is the parent URL and the list of values are it's child URLs.
In the next case a becomes the parent URL and the list of values are it's child URLs.

Now what I actually require is to get a dataframe like:

               a    b    c    d    e    k    j    f
 amazon.com    1    1    1    1    1
     a                             1    1    1    1

where 1 can be assumed to be a value to show that say a is the child of amazon.com

Now the problem is I won't have the data as shown above. They are obtained dynamically as I crawl through the website.

So the flow would be:

Open a website URL
records the URL (parent URL - this is where we get the URL)
records all the URLs present in the page (child URL - this is where we get all the child URLs corresponding to the parent URL and hence can populate our list/dictionary and hence the dataframe)

As can be noticed, no duplicates column headers are found.

Can someone help me out on this one?

Provide a Minimal, Complete, and Verifiable example

kingmakerking
– kingmakerking

2017-11-01 07:55:23 +00:00
Commented Nov 1, 2017 at 7:55 — kingmakerking
– kingmakerking, Commented Nov 1, 2017 at 7:55

kingbase · Accepted Answer · 2017-11-01 10:26:08Z

2

Hope this would help:

xx = {
    'amazon.com': ['a','b','c','d','e'],
    'a'         : ['k','j','e','f']
}
all_vals = [item for key,items in xx.items() for item in items]
all_vals = sorted(set(all_vals))
df = pd.DataFrame(index=xx.keys(),columns=all_vals)

def is_exist(idx,col):
    ret = col in xx[idx]
    return int(ret)

for idx in df.index:
    for col in df.columns:
        df.loc[idx, col] = is_exist(idx, col)

df

answered Nov 1, 2017 at 10:26

kingbase

1,48615 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pythonic_autometeor Over a year ago

thanks a lot! That did help me a lot. But what i required precisely is that -> during each iteration a row and the columns associated with the row dynamically populate. These each row must be added as a new index and the corresponding column values should be input as new column headers which shouldn't be a duplicate in the existing dataframe.

kingbase Over a year ago

Sorry for the inconvincence. But IMHO what you described is very like the code I provided, you can just modify a bit to fit your need.

Collectives™ on Stack Overflow

Pandas: Dynamically add a row and columns and input values to it

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related