1

I have a variable like this:

metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'

I need to create a for loop and go though this metricName one at a time. For example, 1st (WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage) then (WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount) then (GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used) so forth. Delimeter is | but not this \|

I tried creating an array:

data[]

data.append(metricName.split('|'))

but it gives me array like this:

[['(WebSpherePMI\\', 'jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\', 'threadPoolModule\\', 'WebContainer:ActiveCount)', '(GC Monitor\\', 'Memory Pools\\', 'Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Invocations Per Interval Count)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']]

Any ideas how I could put this in an array?

1
  • hard case :) Maybe first use str.replace() and replace all the \| with some special string. Then split by '|'. Then restore the '\|'-s by replacing the "special string".... Not beautiful and buggy workaround, hence I don't post it as an answer, but it may work most of the times if your special string is really special Commented Feb 17, 2015 at 15:56

4 Answers 4

10

You can split your string with regex :

>>> import re
>>> re.split(r'(?<=\))\|(?=\()',metricName)
['(WebSpherePMI\\|jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\|threadPoolModule\\|WebContainer:ActiveCount)', '(GC Monitor\\|Memory Pools\\|Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Invocations Per Interval Count)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']

In this case r'(?<=\))\|(?=\() will split your string based on the pip signs that are between )and( . it use positive look-around for match!

Sign up to request clarification or add additional context in comments.

7 Comments

Note the splitting rule in the question concerns ignoring \|
@Eric yes and i did it too!
This will split on all | directly adjacent to parens, but won't split r"(token\|number)one|token number two". No idea if OP's pattern could include splits like that, but it's worth mentioning. At this point there seems to be three ways of looking at the problem: "Split on all | not preceded by \", "Split on all bars of the pattern )|(", and "Split on all bars not contained by parentheses."
Yes , it could be done with Split on all | not preceded by `\` but i just gave a more general answer!
@KasraAD I'd actually consider yours a less-general answer! :)
|
1

You can't do a naive str.split because you're looking for context-sensitive splitting: i.e.

Split on any vertical bar that is not contained in parentheses

You should probably use regex for this, but my regex is failing me at the moment so let's do something wonky.

stack = 0
tokens = []
last_start = 0
for i in range(len(s)): # iterate through indexes of string s
    if s[i] == "(":
        stack += 1
    if s[i] == ")":
        stack = max(0, stack-1)
        # this will prevent breaking nested parentheses if you have
        # ugly parenthetical text like "A) this, B) that."
    if s[i] == "|" and stack == 0:
        tokens.append(s[last_start:i])
        last_start = i+1

That said, if EVERY SINGLE CASE of your parenthetical vertical bars is preceded by a whack (like in your example) you can simply do:

re.split(r"(?<!\\)\|", s)

Comments

0

You don't want to append to an existing empty list, you just want to create a list. So:

data = metricName.split('|')

1 Comment

he can't do a naive split -- he only wants to split on | not enclosed in parens
0

Delimeter is | but not this \|

From what you are saying, you want a a negative lookbehind assertion.

try this:

import re
metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'
data = re.split(r"(?<!\\)\|", metricName)

This returns

[(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage),
(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount),
(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used),
(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count),
(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\)),
(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)]

Here you have more about the regex functions in python and in particular the negative lookbehind assertion :

(?<!...)

https://docs.python.org/2/library/re.html

If indeed you just want | when it is between ) and ( then above answer is best.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.