0

Each row in my dataframe has a string containing some URL query parameters, e.g. flt=promotionflag%3A1%3Borganicfilter%3AOrganic&sortBy=MOST_POPULAR. The flt section could contain multiple parameters as in this example.

I want to parse this string into multiple columns like:

flt_promotionflag flt_organicfilter sortBy
1 Organic MOST_POPULAR

There could be lots of different filters so I don't want to hardcode these as column names. If there is already a column with that filter name I want to put the value in the existing column, and if there there isn't a column with that filter name I want to create it.

I've written some code that creates a dictionary in the structure I want in a new column but I think that's probably an unnecessary step.

def createDict(string):
    try:
      d = dict(x.split("=") for x in string.strip("&").split("&"))
      if 'flt' in d:
        if '%3B' in d['flt']:
            d['flt'] = dict(x.split("%3A") for x in d['flt'].split("%3B"))
        else:
            d['flt'] = {d['flt'].split("%3A")[0] : 1}
      else:
        pass
      return d
    except:
      pass

df['Parsed params'] = df['URL Query Parameters'].apply(createDict)

How do I get the data I want in the right columns?

2
  • url lib.parse.parse_qs() Commented Feb 9, 2021 at 10:02
  • Thanks @RobRaymond I had a go with urllib but it seemed to not be recognising this as a bit of URL because it doesn't start with http/s. Is that right or do I need to persevere with it? Also, my issue isn't parsing the string, it's getting the sections into the right columns. Commented Feb 9, 2021 at 10:07

1 Answer 1

1

All the utilities you need are already in place

  • urllib.parse.parse_qs() generates a dict from a URL query string
  • constructor of DataFrame()
  • further expand parameters that are not part of standard URL parsing
df = pd.DataFrame(urllib.parse.parse_qs("flt=promotionflag%3A1%3Borganicfilter%3AOrganic&sortBy=MOST_POPULAR"))

# expand out parameters semi-colon delimited
df = (df
 .assign(flt=df.flt.str.split(";"))
 .explode("flt")
 .reset_index(drop=True)
)
# change colon delimited into key/value columns
df = df.join(df.flt.apply(lambda s: {"key":s.split(":")[0], "value":s.split(":")[1]}).apply(pd.Series))

flt sortBy key value
0 promotionflag:1 MOST_POPULAR promotionflag 1
1 organicfilter:Organic MOST_POPULAR organicfilter Organic
Sign up to request clarification or add additional context in comments.

1 Comment

Any way of parsing out the different filters to separate columns?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.