create a dataframe from the string pandas python

Question

I am creating a program to list out the ip address and users connected in the LAN. I done by getting the data by using nmap. Next i want to change the result data to a certain data frame using pandas or any other way. How to do it.

Here's the code:

import pandas as pd
import subprocess
from subprocess import Popen, PIPE
import re

def ipget():
    i = 'nmap -sP 192.168.1.*'
    output = subprocess.getoutput(i)
    a = str(output).replace("Nmap","").replace("Starting  7.01 ( https://nmap.org ) at","").replace("scan report for","").replace("Host is up","").replace("latency","").replace("done: 256 IP addresses ","")
    data = re.sub(r"(\(.*?\)\.)", "", a)
    print(data)
#df = pd.DataFrame(data, columns = ['User', 'IP_Address']) 

#print (df) 
ipget()

the output stored in data and it is a string:

2019-05-21 18:19 IST 
android-eb20919729f10e96 (192.168.1.8)

smackcoders (192.168.1.9)

princes-mbp (192.168.1.10)

shiv-mbp (192.168.1.15)

(4 hosts up) scanned in 18.35 seconds

Required output to be created in dataframe:

User                            IP_Address
android-eb20919729f10e96        192.168.1.8
smackcoders                     192.168.1.9
princes-mbp                     192.168.1.10
shiv-mbp                        192.168.1.15

knh190 · Accepted Answer · 2019-05-21 14:11:16Z

4

Saying you have text:

2019-05-21 18:19 IST 
android-eb20919729f10e96 (192.168.1.8)

smackcoders (192.168.1.9)

princes-mbp (192.168.1.10)

shiv-mbp (192.168.1.15)

(4 hosts up) scanned in 18.35 seconds

Use regex to find the data you need:

>>> ms = re.findall(r'\n([^\s]*)\s+\((\d+\.\d+\.\d+\.\d+)\)', text)
>>> ms

[('android-eb20919729f10e96', '192.168.1.8'),
 ('smackcoders', '192.168.1.9'),
 ('princes-mbp', '192.168.1.10'),
 ('shiv-mbp', '192.168.1.15')]

>>> df = pd.DataFrame(ms, columns=['User', 'IP_Address'])

Comparison to other answers:

Regex is short.
Regex only runs though your text once.

str.replace runs once per call so the regex solution can gain huge efficiency for long logs.

edited May 21, 2019 at 14:11

answered May 21, 2019 at 13:57

knh190

2,8821 gold badge21 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mad_ · Accepted Answer · 2019-05-21 14:09:14Z

3

Use StringIO

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd
a="""
android-eb20919729f10e96 (192.168.1.8)

smackcoders (192.168.1.9)

princes-mbp (192.168.1.10)

shiv-mbp (192.168.1.15)"""

TESTDATA = StringIO(a)

df = pd.read_csv(TESTDATA, sep=" ",names=['User','IP_Address'])

Add below line to remove ( and )

import re
df.IP_Address = df.IP_Address.map(lambda x:re.sub('\(|\)',"",x))

edited May 21, 2019 at 14:09

answered May 21, 2019 at 14:03

mad_

8,2832 gold badges32 silver badges46 bronze badges

2 Comments

knh190 Over a year ago

However, this does include the parenthesis in IP.

mad_ Over a year ago

@knh190 was still editing the post. Thanks for the comment

Florian H · Accepted Answer · 2019-05-21 14:06:02Z

2

Assuming your string is named s the following code does what you want:

line_list = []

# iterate over each line
for line in s.split("\n"):
    #remove empty lines
    if line == '':
        continue

    #replace ( and ) with empty strings 
    line = line.replace("(", "").replace(")", "")

    line_list.append(line)

# remove first and last line
line_list = line_list[1:-1]

array = []
# split lines by " "
for line in line_list:
    array.append(line.split(" "))

# create dataframe
pd.DataFrame(array, columns = ["User", "IP_Adress"])

Using listcomprehension you can do the same as a oneliner:

pd.DataFrame([line.replace("(", "").replace(")", "").split(" ") for line in s.split("\n") if line != ""][1:-1], columns = ["User", "IP_Adress"])

edited May 21, 2019 at 14:06

answered May 21, 2019 at 13:57

Florian H

3,0722 gold badges16 silver badges27 bronze badges

3 Comments

knh190 Over a year ago

There's definitely no need for multiple lines. One line regex is enough and more efficient.

Florian H Over a year ago

I got it as a oneliner ;). Still your solution is way more elegant! +1 for your answer.

knh190 Over a year ago

Upvoted but had to say that str.replace runs though whole text once per call while regex takes care of result in one run.

Collectives™ on Stack Overflow

create a dataframe from the string pandas python

3 Answers 3

Comments

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related