how do you put text into an array in python

Question

I have a variable like this:

metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'

I need to create a for loop and go though this metricName one at a time. For example, 1st (WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage) then (WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount) then (GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used) so forth. Delimeter is | but not this \|

I tried creating an array:

data[]

data.append(metricName.split('|'))

but it gives me array like this:

[['(WebSpherePMI\\', 'jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\', 'threadPoolModule\\', 'WebContainer:ActiveCount)', '(GC Monitor\\', 'Memory Pools\\', 'Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Invocations Per Interval Count)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']]

Any ideas how I could put this in an array?

hard case :) Maybe first use str.replace() and replace all the \| with some special string. Then split by '|'. Then restore the '\|'-s by replacing the "special string".... Not beautiful and buggy workaround, hence I don't post it as an answer, but it may work most of the times if your special string is really special — SomethingSomething
– SomethingSomething, Commented Feb 17, 2015 at 15:56

Kasravnd · Accepted Answer · 2015-02-17 15:55:00Z

10

You can split your string with regex :

>>> import re
>>> re.split(r'(?<=\))\|(?=\()',metricName)
['(WebSpherePMI\\|jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\|threadPoolModule\\|WebContainer:ActiveCount)', '(GC Monitor\\|Memory Pools\\|Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Invocations Per Interval Count)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']

In this case r'(?<=\))\|(?=\() will split your string based on the pip signs that are between )and( . it use positive look-around for match!

answered Feb 17, 2015 at 15:55

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Eric Over a year ago

Note the splitting rule in the question concerns ignoring \|

Kasravnd Over a year ago

@Eric yes and i did it too!

Adam Smith Over a year ago

This will split on all | directly adjacent to parens, but won't split r"(token\|number)one|token number two". No idea if OP's pattern could include splits like that, but it's worth mentioning. At this point there seems to be three ways of looking at the problem: "Split on all | not preceded by \", "Split on all bars of the pattern )|(", and "Split on all bars not contained by parentheses."

Kasravnd Over a year ago

Yes , it could be done with Split on all | not preceded by `\` but i just gave a more general answer!

Adam Smith Over a year ago

@KasraAD I'd actually consider yours a less-general answer! :)

|

Adam Smith · Accepted Answer · 2015-02-17 15:57:51Z

You can't do a naive str.split because you're looking for context-sensitive splitting: i.e.

Split on any vertical bar that is not contained in parentheses

You should probably use regex for this, but my regex is failing me at the moment so let's do something wonky.

stack = 0
tokens = []
last_start = 0
for i in range(len(s)): # iterate through indexes of string s
    if s[i] == "(":
        stack += 1
    if s[i] == ")":
        stack = max(0, stack-1)
        # this will prevent breaking nested parentheses if you have
        # ugly parenthetical text like "A) this, B) that."
    if s[i] == "|" and stack == 0:
        tokens.append(s[last_start:i])
        last_start = i+1

That said, if EVERY SINGLE CASE of your parenthetical vertical bars is preceded by a whack (like in your example) you can simply do:

re.split(r"(?<!\\)\|", s)

Daniel Roseman · Accepted Answer · 2015-02-17 15:50:41Z

0

You don't want to append to an existing empty list, you just want to create a list. So:

data = metricName.split('|')

answered Feb 17, 2015 at 15:50

Daniel Roseman

602k68 gold badges910 silver badges923 bronze badges

1 Comment

Adam Smith Over a year ago

he can't do a naive split -- he only wants to split on | not enclosed in parens

Matthias Lloyd · Accepted Answer · 2015-02-18 09:01:03Z

Delimeter is | but not this \|

From what you are saying, you want a a negative lookbehind assertion.

try this:

import re
metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'
data = re.split(r"(?<!\\)\|", metricName)

This returns

[(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage),
(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount),
(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used),
(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count),
(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\)),
(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)]

Here you have more about the regex functions in python and in particular the negative lookbehind assertion :

(?<!...)

https://docs.python.org/2/library/re.html

If indeed you just want | when it is between ) and ( then above answer is best.

Collectives™ on Stack Overflow

how do you put text into an array in python

4 Answers 4

7 Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related