0

I need to extract a matrix from a string that looks like this (it can be a bigger matrix):

[[13,2,99][-2,3,13][1,3,0][7,77,777]]

I wanted to match all the list looking substrings with a regular expression that i tested on regexr.com that gave me the matches i wanted but not on pythex.org or in my script

Here is a sample code that uses the regex:

import numpy as np
import re
matrix = "[[13,2,99][-2,3,13][1,3,0][7,77,777]]"
l = []
regex = re.compile(r"\[(-?[0-9]+,)+-?[0-9]+]")
for el in re.findall(regex, matrix):
    l.append(np.fromstring(el[1:len(el)-1], dtype=int, sep=",").tolist())
a = np.array(l)

4 Answers 4

2

You can just jam some commas in there and json.loads it:

json.loads(matrix.replace('][', '],['))
Sign up to request clarification or add additional context in comments.

Comments

1

The capturing parentheses in your regex causes re.findall to only return the parenthesized submatches. Switching to non-grouping parentheses fixes it.

Python 3.8.2+ (heads/3.8:686d508, Mar 26 2020, 09:32:57) 
[Clang 11.0.3 (clang-1103.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> matrix = "[[13,2,99][-2,3,13][1,3,0][7,77,777]]"
>>> regex = re.compile(r"\[(-?[0-9]+,)+-?[0-9]+]")
>>> re.findall(regex, matrix)
['2,', '3,', '3,', '77,']
>>> regex = re.compile(r"\[(?:-?[0-9]+,)+-?[0-9]+]")
>>> re.findall(regex, matrix)
['[13,2,99]', '[-2,3,13]', '[1,3,0]', '[7,77,777]']

Comments

0

Try Using

import re
import numpy as np

s = "[[13,2,99][-2,3,13][1,3,0][7,77,777]]"
l = []
for i in re.findall(r"(\[.*?\])", s):    # Find everything inside [] brackets. 
    l.append(np.fromstring(i.strip("[]"), dtype=int, sep=","))
    
print(l)

Output:

[array([13,  2, 99]), array([-2,  3, 13]), array([1, 3, 0]), array([  7,  77, 777])]

Without numpy

import re

s = "[[13,2,99][-2,3,13][1,3,0][7,77,777]]"
l = []
for i in re.findall(r"(\[.*?\])", s):
    l.append(list(map(int, i.strip("[]").split(","))))
    
print(l)

Output:

[[13, 2, 99], [-2, 3, 13], [1, 3, 0], [7, 77, 777]]

Comments

0

That looks a lot like a list of lists, except for the missing commas. There are various ways massaging and splitting.

In [72]: astr = "[[13,2,99][-2,3,13][1,3,0][7,77,777]]"                                              

get rid of the outer brackets

In [75]: astr.strip('[]')                                                                            
Out[75]: '13,2,99][-2,3,13][1,3,0][7,77,777'

replace the inner ones, and immediately split:

In [76]: astr.strip('[]').replace('][',';').split(';')                                               
Out[76]: ['13,2,99', '-2,3,13', '1,3,0', '7,77,777']

split those inner strings:

In [77]: [sub.split(',') for sub in _]                                                               
Out[77]: [['13', '2', '99'], ['-2', '3', '13'], ['1', '3', '0'], ['7', '77', '777']]

If those sublists are all the same length, and numeric strings, the we can easily make an array from those (with dtype conversion to integer):

In [78]: np.array(_, int)                                                                            
Out[78]: 
array([[ 13,   2,  99],
       [ -2,   3,  13],
       [  1,   3,   0],
       [  7,  77, 777]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.