0

I have pandas dataframe like this.

api                             region   base_path
https://apis.us/image/          us          /image
https://apis.emea/video/        emea        /video
https://apis.asia/docs/         asia        /docs
https://apis.emea/image/        emea        /image
https://apis.us/video/          us          /video
https://apis.us/docs/           us          /docs
https://apis.asia/location/     asia        /location

From the api list few apis are common in more than 1 region .Ex: /image is common for both us and emea. The output dataframe I want like this:

api_us_emea                 api_asia_us                 api_asia_emea           api_us_emea_asia        api_usa     api_emea    api_asia

https://apis.us/image/      https://apis.us/docs/       No Common api           No Common api           N/A         N/A         https://apis.asia/location/ 
https://apis.us/video/

Here, for common apis I always want us api to be present in the column value.Ex: api_us_emea column holds only US api, for api_asia_emea asia api and for api_us_emea_asia us api as well. Hiow can I acheive this?

2 Answers 2

2

I think this code snippet should give you what you want, or at least a reasonable direction to solve your problem. Basically iterate through the possible subsets of regions, and take all of the relevant base_paths for that subset. Remove those that we used already in bigger subsets that contain the subset that we are currently looking at. Hope I helped.

from collections import defaultdict
import pandas as pd
from itertools import chain, combinations

data = [['https://apis.us/image/',          'us',          '/image'],
        ['https://apis.emea/video/',        'emea',        '/video'],
        ['https://apis.asia/docs/',         'asia',        '/docs'],
        ['https://apis.emea/image/',        'emea',        '/image'],
        ['https://apis.us/video/',          'us',          '/video'],
        ['https://apis.us/docs/',           'us',          '/docs'],
        ['https://apis.asia/location/',     'asia',        '/location']]

df = pd.DataFrame(data, columns=['api', 'region', 'base_path'])


def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))


def flatten(t):
    return [item for sublist in t for item in sublist]


new_dict = defaultdict(list)
for subset in reversed(list(powerset(pd.unique(df['region'])))):
    if len(subset) > 0:
        for api_path in pd.unique(df['base_path']):
            df_path = df[df['base_path'] == api_path]
            if set(subset).issubset(set(pd.unique(df_path['region']))):
                new_dict[subset].append(api_path)
        curr_keys = list(new_dict.keys())
        for key in curr_keys:
            if set(subset).issubset(set(key)) and len(key) > len(subset):
                for remove_path in [x for x in new_dict[subset] if x in new_dict[key]]:
                    new_dict[subset].remove(remove_path)
        if subset not in new_dict:
            new_dict[subset] = []

new_df = pd.DataFrame({k: pd.Series(v) for k, v in new_dict.items()})
new_df.columns = ['_'.join([s for s in x if not pd.isna(s)]) for x in new_df.columns]

new_df looks like this:
enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Great! One question, the column sequence of new_df -will it be same irrespective of input data?Below is print(new_df.columns) = MultiIndex([('apjc', 'emea', 'asia'), ('emea','asia',nan), ('apjc','asia',nan), ('apjc','emea',nan), ('asia',nan,nan), ('emea',nan,nan), ('apjc',nan,nan), ( nan,nan,nan)])
Well, it will contain all of the region combinations which include relevant data, or did I miss the point of your question?
you are right.I am just asking whether sequence of combinations remain same irrespective of input data or not?
I made a minor edit and now the column sequence will be the same for all inputs that have the same set of values in the region column. Also formatted the column names a bit nicer.
2

Try this:

import itertools
import functools, operator

def find_coomon_elements(p):
    return list(set.intersection(*[set(li) for li in p]))

def find_unique_elements(p, l):
    merged_p = functools.reduce(operator.iconcat, p, [])
    return [x for x in l if merged_p.count(x)==1]

strings_array = df["api"].str[:-1].str.split("/").str[-2:].apply(lambda x: (x[0][5:], x[1])).values

d = dict()
[d[t[0]].append(t[1]) if t[0] in list(d.keys()) else d.update({t[0]: [t[1]]}) for t in strings_array]
se = set([x[0] for x in strings_array])


combs = [list(itertools.combinations(se, i)) for i in range(1, len(se)+1)]

col1, col2 = [], []
for item in combs[0]:
    col1.append("_".join(["api"] + list(item)))
    col2.append(["https://apis."+item[0]+"/"+s for s in find_unique_elements([d[c] for c in d.keys()], d[item[0]])])

for i in range(1, len(combs)):
    for item in combs[i]:
        common = find_coomon_elements([d[c] for c in item])
        if len(common)>0:
            col1.append("_".join(["api"] + list(item)))
            col2.append(["https://apis."+item[0]+"/"+s for s in common])
        else:
            col1.append("_".join(["api"] + list(item)))
            col2.append("No Common api")
        

output_df = pd.DataFrame({"col1":col1, "col2":col2})
output_df

Output:

               col1                                            col2
0            api_us                                              []
1          api_asia                    [https://apis.asia/location]
2          api_emea                                              []
3       api_us_asia                          [https://apis.us/docs]
4       api_us_emea  [https://apis.us/image, https://apis.us/video]
5     api_asia_emea                                   No Common api
6  api_us_asia_emea                                   No Common api

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.