Extract specific Pattern From a String in python

Question

I have below data in a column of Dataframe(Contains approx 100 Rows).

Need to Extract CK string (CK-36799-1523333) from DF for each row.

Note: receipt_id is not fixed.Ck data may contains in some different variable.

Data:

{"currency":"US","Cost":129,"receipt_id":"CK-36799-1523333","af_customer_user_id":"33738413"}

{"currency":"INR","Cost":429,"receipt_id":"CK-33711-15293046","af_customer_user_id":"33738414"}

{"currency":"US","Cost":229,"receipt_id":"CK-36798-1523333","af_customer_user_id":"33738423"}

{"currency":"INR","Cost":829,"receipt_id":"CK-33716-152930456","af_customer_user_id":"33738214"}

  {"currency":"INR","Cost":829,"order_id":"CK-33716-152930456","af_customer_user_id":"33738214"}

  {"currency":"INR","Cost":829,"suborder_id":"CK-33716-152930456","af_customer_user_id":"33738214"}

Result

CK-36799-1523333
CK-33711-15293046
CK-36798-1523333
CK-33716-152930456

I tried str.find('CK-') function but Not getting Expected result.Need Suggestions

check edited answer.

jezrael
– jezrael

2019-03-06 12:38:30 +00:00
Commented Mar 6, 2019 at 12:38 — jezrael
– jezrael, Commented Mar 6, 2019 at 12:38

G. B. · Accepted Answer · 2019-03-06 12:18:53Z

1

Try using regular expressions

import re

...
for line in data:
    res = re.findall(r"CK\-[0-9]+\-[0-9]+", line)
    if len(res) != 0:
        print(res[0])

answered Mar 6, 2019 at 12:18

G. B.

6324 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2019-03-06 12:39:29Z

Use Series.str.extract:

df['new'] = df['col'].str.extract(r"(CK\-\d+\-\d+)", expand=False).fillna('no match')
print (df)
                                                 col                 new
0  {"currency":"US","Cost":129,"receipt_id":"CK-3...    CK-36799-1523333
1  {"currency":"INR","Cost":429,"receipt_id":"CK-...   CK-33711-15293046
2  {"currency":"US","Cost":229,"receipt_id":"CK-3...    CK-36798-1523333
3  {"currency":"INR","Cost":829,"receipt_id":"CK-...  CK-33716-152930456
4    {"currency":"INR","Cost":829,"order_id":"CK-...  CK-33716-152930456
5    {"currency":"INR","Cost":829,"suborder_id":"...  CK-33716-152930456

Another solution is loop by dictionaries and select first match, if not exist, add default value:

import ast

f = lambda x: next(iter(v for v in ast.literal_eval(x).values() 
                        if str(v).startswith('CK-')), 'no match')
df['new'] = df['col'].apply(f)

mahdi nezhadasad · Accepted Answer · 2019-03-06 13:37:02Z

0

Let suppose this is a csv file then we can find it like this code.

import re

pattern = re.compile(r'CK-36799-1523333)')
ck_list = []

with open('ck.csv', 'r') as f:  ## where ck.csv is the file you shared above
    for i in f:
        if pattern.search(i):
            ck_list.append(i.split(',')[0].strip())

answered Mar 6, 2019 at 13:37

mahdi nezhadasad

961 silver badge7 bronze badges

Collectives™ on Stack Overflow

Extract specific Pattern From a String in python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related