5

I need help to match 2 strings and replace with empty string ' '. Appreciate your help as i am still new in Python and coding:

crypto pki certificate chain TP-self-signed-1357590403
  +30820330 30820218 A0030201 02020101 300D0609 2A864886 F70D0101 05050030
  +31312F30 2D060355 04031326 494F532D 53656C66 2D536967 6E65642D 43657274
  +69666963 6174652D 31333537 35393034 3033301E 170D3139 30313234 31353436
  +34345A17 0D323030 31303130 30303030 305A3031 312F302D 06035504 03132649
  +4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 33353735
  +39303430 33308201 22300D06 092A8648 86F70D01 01010500 0382010F 00308201
  +0A028201 0100E69D C133454E 401E763A 7686E453 5D58020D 0E6E122F A0F19E15
  +E0975148 666110BD C1F09B86 CB701C20 EF85E024 F759A921 D11DA10C A13BA3BD
  +20006387 917287CE EA0CFDDC 2FA5DD07 E5B200F4 108CACA1 DCEF0E4E EEE908ED
  +2ACD693B FC90A24F 9F865CB9 859FEFB0 EB8904D4 8FA83D29 E93B892F 32F3EC7D
  +EAA2850E 1793BBCE 86EA47B2 15645634 D81EA89C 1C2BC092 766DF58F 0B289A82
  +0C92E551 7AA9588E F5B41A41 6DB4C785 101E674D BBBCFB42 9F4F9A25 70389515
  +D1C07E2F 18C0557D 95283E90 3CCD2966 5EBF5668 A6B0B847 0B278906 E5BFA668
  +EFBE938A BE70C4C0 1A8D7218 71463EA5 49540A45 DF307B4C 459E657D C039BB68
  +F047B0B2 2F250203 010001A3 53305130 0F060355 1D130101 FF040530 030101FF
  +301F0603 551D2304 18301680 141FADF3 CC2C2293 810EDAA8 9E55327C D2B7D88A
  +88301D06 03551D0E 04160414 1FADF3CC 2C229381 0EDAA89E 55327CD2 B7D88A88
  +300D0609 2A864886 F70D0101 05050003 82010100 91E63F44 376F91C1 C50C08E4
  +B29B902B B1BC7831 C5607897 030835A6 108FC1F2 6F3DEE23 EF3E8FFF 81A121B5
  +26596004 F8F61DFD 1B603C5D 42D850E6 439C7CAE BFC285AE 3FD83870 125594C0
  +51EAAC09 BF42446F C6399B90 D0E10ACA B208819B 645BECE5 DBDDA9AD EBA1FCD9
  +2B14D0DE AB2AC1BF FF064076 ADBB4540 17AB77A4 C6B0DA3B 1BC0F5B8 44030E7B
  +27318CEE 14C90739 DD8684A8 9346EEC1 3F4958EF 835BA822 F58523C9 E9F83105
  +D3E68700 20DAFC5E B1B8CF5B BAC5CEB3 00321088 43125173 51FC8006 270731E6
  +0E0C6183 68BABA99 BD9F4F28 1EDA82D4 F00F1359 F30B6501 BC468C89 49111AB2
  +CBDE5A9D DB8DB33A 45FE6C96 7D49A70F 4C299618

Will always have 27 lines starting with first line

Second is:

crypto pki certificate chain TP-self-signed-1357590403
 -certificate self-signed 01 nvram:IOS-Self-Sig#1.cer
3
  • 1
    For the input string what is your expected output string? You want to erase the whole thing? Commented Apr 2, 2019 at 7:59
  • yes..just want to erase the this part actually Commented Apr 2, 2019 at 8:46
  • Check my answer! It should work ;-) Commented Apr 2, 2019 at 8:51

4 Answers 4

2

If you want to match the line including the next line, you could match all the lines and use a negative lookahead to assert that the next line does not start with crypto.

Then match a newline and crypto until the end of the line:

^crypto pki certificate chain TP-self-signed-.*(?:\n(?!crypto).*)*\ncrypto.*

Regex demo

If the starting line should be the same as the line at the end you could use a capturing group for the first line with a backreference:

^(crypto pki certificate chain TP-self-signed-.*)(?:\n(?!\1).*)*\n\1

Regex demo

Your code could look like

pattern = r'^(crypto pki certificate chain TP-self-signed-.*)(?:\n(?!\1).*)*\n\1'
df=re.sub(pattern, '' , file, 0, re.MULTILINE)
Sign up to request clarification or add additional context in comments.

9 Comments

You do not have to test for \n twice, use \n(?!crypto).* instead of (?!\ncrypto)\n.*
@WiktorStribiżew You are right, I have updated it. Thanks, much appreciated!
I wonder why it is downvoted. This unroll-the-loop technique is best when it comes to regex matching performance. +1 definitely.
Thanks guys! I am testing it but it seems is it removing only the first line:
regex works! Thanks.It is a line on my code before that!
|
1

You can use the following code:

import re

inputStr = """crypto pki certificate chain TP-self-signed-1357590403
  +30820330 30820218 A0030201 02020101 300D0609 2A864886 F70D0101 05050030
  +31312F30 2D060355 04031326 494F532D 53656C66 2D536967 6E65642D 43657274
  +69666963 6174652D 31333537 35393034 3033301E 170D3139 30313234 31353436
  +34345A17 0D323030 31303130 30303030 305A3031 312F302D 06035504 03132649
  +4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 33353735
  +39303430 33308201 22300D06 092A8648 86F70D01 01010500 0382010F 00308201
  +0A028201 0100E69D C133454E 401E763A 7686E453 5D58020D 0E6E122F A0F19E15
  +E0975148 666110BD C1F09B86 CB701C20 EF85E024 F759A921 D11DA10C A13BA3BD
  +20006387 917287CE EA0CFDDC 2FA5DD07 E5B200F4 108CACA1 DCEF0E4E EEE908ED
  +2ACD693B FC90A24F 9F865CB9 859FEFB0 EB8904D4 8FA83D29 E93B892F 32F3EC7D
  +EAA2850E 1793BBCE 86EA47B2 15645634 D81EA89C 1C2BC092 766DF58F 0B289A82
  +0C92E551 7AA9588E F5B41A41 6DB4C785 101E674D BBBCFB42 9F4F9A25 70389515
  +D1C07E2F 18C0557D 95283E90 3CCD2966 5EBF5668 A6B0B847 0B278906 E5BFA668
  +EFBE938A BE70C4C0 1A8D7218 71463EA5 49540A45 DF307B4C 459E657D C039BB68
  +F047B0B2 2F250203 010001A3 53305130 0F060355 1D130101 FF040530 030101FF
  +301F0603 551D2304 18301680 141FADF3 CC2C2293 810EDAA8 9E55327C D2B7D88A
  +88301D06 03551D0E 04160414 1FADF3CC 2C229381 0EDAA89E 55327CD2 B7D88A88
  +300D0609 2A864886 F70D0101 05050003 82010100 91E63F44 376F91C1 C50C08E4
  +B29B902B B1BC7831 C5607897 030835A6 108FC1F2 6F3DEE23 EF3E8FFF 81A121B5
  +26596004 F8F61DFD 1B603C5D 42D850E6 439C7CAE BFC285AE 3FD83870 125594C0
  +51EAAC09 BF42446F C6399B90 D0E10ACA B208819B 645BECE5 DBDDA9AD EBA1FCD9
  +2B14D0DE AB2AC1BF FF064076 ADBB4540 17AB77A4 C6B0DA3B 1BC0F5B8 44030E7B
  +27318CEE 14C90739 DD8684A8 9346EEC1 3F4958EF 835BA822 F58523C9 E9F83105
  +D3E68700 20DAFC5E B1B8CF5B BAC5CEB3 00321088 43125173 51FC8006 270731E6
  +0E0C6183 68BABA99 BD9F4F28 1EDA82D4 F00F1359 F30B6501 BC468C89 49111AB2
  +CBDE5A9D DB8DB33A 45FE6C96 7D49A70F 4C299618
crypto pki certificate chain TP-self-signed-1357590403"""

print(re.sub(r'crypto pki certificate chain TP-self-signed-\d+\s*[0-9a-fA-F+\s]+\s*crypto pki certificate chain TP-self-signed-\d+', '' , inputStr))

output: empty

Regex demo: https://regex101.com/r/G9XciA/2/

Regex explanations:

  • crypto pki certificate chain TP-self-signed-\d+\s* matches the first line, where the ending is considered to be only digits followed by any whitespaces characters
  • [0-9a-fA-F+\s]+ will match the hexadecimal characters,+, and white spaces char
  • crypto pki certificate chain TP-self-signed-\d+\s* last line to end the matching. if the ID is the same at the first and last line.

Use the regex:

crypto pki certificate chain TP-self-signed-(\d+)\s*[0-9a-fA-F+\s]+\s*crypto pki certificate chain TP-self-signed-\1

Where you have a backreference to the first capturing group

demo: https://regex101.com/r/G9XciA/3

7 Comments

Thanks Alan! I am almost there.It seems that is erasing only the first line.I am testing the string as .txt file
@IvanMadolev: you should load the whole file in a string and remove the literal \n if they are present
indeed on python shell is working but when i try to implement on my code it is not working fro some reason.
@IvanMadolev: if you show us your code we can help you fixing this!
your regex works for sure! I know what was going wrong and i am trying to fix myself.While i am opening the file i was removing the first line my mistake before the regex! If still issues i will post the code
|
1

Why not just use this regex,

(crypto pki certificate chain TP-self-signed-\d+)[\w\W]+?\1

and remove it with empty string?

Am I missing some point as other answer seems to be suggesting somewhat complex solutions involving newline characters?

Demo

Edit: As per your comment "Actually what i need is to remove :crypto pki certificate chain TP-self-signed-1357590403 plus the next 26 lines starting with +"

You can use this regex which selects exactly 26 lines starting with + after crypto pki certificate chain TP-self-signed-1357590403 line.

crypto pki certificate chain TP-self-signed-\d+(?:\n\s*\+[^\n]*){26}

Demo

As you can see in the demo, it exactly selects only 26 lines starting with + and removes them with empty string. Let me know if you face any issues.

4 Comments

Actually what i need is to remove :crypto pki certificate chain TP-self-signed-1357590403 plus the next 26 lines starting with +
Ok, now that is something precise to be done. Let me update my answer.
works! Thank you! I will try to adjust to my scrip now
Can you please advise how to implement it with panda and convert to csv file.I am using : df=pd.read_csv(file, skiprows = 1, error_bad_lines=False).dropna() df=re.sub(r'crypto pki certificate chain TP-self-signed-\d+(?:\n\s*\+[^\n]*){26}' , '' , file) df.to_csv(file, index=False) Getting error: AttributeError: 'str' object has no attribute 'to_csv'
1

Can't know exactly what you are after as your didn't give information on your desired result, so we can only guess.

If you are looking to simply replace it all you can use something such as

from tkinter import *
import re

document_x = open('text.txt', encoding="utf8").read()

regex_test = re.sub(r".*\n*( +.*)*", "", document_x)

print(regex_test);

To remove everything between the crypto lines, using

regex_test = re.sub(r"(?:\n(?!crypto).*)*", "" , document_x)

Or to remove the crypto lines themselves you can instead use

regex_test = re.sub("crypto pki certificate chain TP-self-signed-[0-9]+\n", "" , 
                     document_x, re.MULTILINE)

I have run through through a python 3.6.1 shell to confirm they do work. Online regex testers, although helpful, do not always return the same results as python itself

Possible example answer is

from tkinter import *
import re

document_x = open('text.csv', encoding="utf8").read()

regex_test = re.sub(r"(crypto[\s\S]*1357590403)", "", document_x)

print(regex_test);

You should modify it to suit your needs, this is just an example. Given you want to remove the entirety of the block but nothing before or after EG

Placeholder 1
crypto pki certificate chain TP-self-signed-1357590403
  +30820330 30820218 A0030201 02020101 300D0609 2A864886 F70D0101 05050030
  +31312F30 2D060355 04031326 494F532D 53656C66 2D536967 6E65642D 43657274
  +69666963 6174652D 31333537 35393034 3033301E 170D3139 30313234 31353436
  +34345A17 0D323030 31303130 30303030 305A3031 312F302D 06035504 03132649
  +4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 33353735
  +39303430 33308201 22300D06 092A8648 86F70D01 01010500 0382010F 00308201
  +0A028201 0100E69D C133454E 401E763A 7686E453 5D58020D 0E6E122F A0F19E15
  +E0975148 666110BD C1F09B86 CB701C20 EF85E024 F759A921 D11DA10C A13BA3BD
  +20006387 917287CE EA0CFDDC 2FA5DD07 E5B200F4 108CACA1 DCEF0E4E EEE908ED
  +2ACD693B FC90A24F 9F865CB9 859FEFB0 EB8904D4 8FA83D29 E93B892F 32F3EC7D
  +EAA2850E 1793BBCE 86EA47B2 15645634 D81EA89C 1C2BC092 766DF58F 0B289A82
  +0C92E551 7AA9588E F5B41A41 6DB4C785 101E674D BBBCFB42 9F4F9A25 70389515
  +D1C07E2F 18C0557D 95283E90 3CCD2966 5EBF5668 A6B0B847 0B278906 E5BFA668
  +EFBE938A BE70C4C0 1A8D7218 71463EA5 49540A45 DF307B4C 459E657D C039BB68
  +F047B0B2 2F250203 010001A3 53305130 0F060355 1D130101 FF040530 030101FF
  +301F0603 551D2304 18301680 141FADF3 CC2C2293 810EDAA8 9E55327C D2B7D88A
  +88301D06 03551D0E 04160414 1FADF3CC 2C229381 0EDAA89E 55327CD2 B7D88A88
  +300D0609 2A864886 F70D0101 05050003 82010100 91E63F44 376F91C1 C50C08E4
  +B29B902B B1BC7831 C5607897 030835A6 108FC1F2 6F3DEE23 EF3E8FFF 81A121B5
  +26596004 F8F61DFD 1B603C5D 42D850E6 439C7CAE BFC285AE 3FD83870 125594C0
  +51EAAC09 BF42446F C6399B90 D0E10ACA B208819B 645BECE5 DBDDA9AD EBA1FCD9
  +2B14D0DE AB2AC1BF FF064076 ADBB4540 17AB77A4 C6B0DA3B 1BC0F5B8 44030E7B
  +27318CEE 14C90739 DD8684A8 9346EEC1 3F4958EF 835BA822 F58523C9 E9F83105
  +D3E68700 20DAFC5E B1B8CF5B BAC5CEB3 00321088 43125173 51FC8006 270731E6
  +0E0C6183 68BABA99 BD9F4F28 1EDA82D4 F00F1359 F30B6501 BC468C89 49111AB2
  +CBDE5A9D DB8DB33A 45FE6C96 7D49A70F 4C299618
crypto pki certificate chain TP-self-signed-1357590403
Placeholder 2

Running the above example, the return removes the block, leaving what was around it, I.E.

Placeholder 1

Placeholder 2

5 Comments

Thanks! Actually i am trying to remove completely this string starting and ending with "crypto pki certificate chain TP-self-signed-1357590403" The number will be always the same on both lines. Can be either txt or csv file
Updated it with a possible fix for what you are after, it will likely need some modification for exactly what you wish, as this is simply a rough example. I have done this as I am not sure the source you are working from, so in case of a very specific case where there is a collision, you may have to chance to arbitrary values
Works perfect for txt file. now will try adopt it to csv which was the original requirement!
Remember to pick an answer to mark the subject as solved so it isn't forcefully closed later. Also tried the same thing on a spreadsheet (csv), worked same as it did with the txt file, spanning 30 lines, removing all but the 2 outside the spectrum. However, as I do not actually know what you are doing, you might need something different.
what i am trying to do now it is complex as i was running a code where the script is logging to cisco routers to catch unsaved changes on the configurations.So if no changes , the command output was empty line but now is showing those lines that i want to remove but i am having now another issue but my regex questions have been answer so will mark solved

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.