2

I'm trying to extract the first ISS TLE (Two Line Element set) from this website.

I need the first three lines following the:

 TWO LINE MEAN ELEMENT SET

text: (ISS line, 1 line, 2 line).

So I get the text that has what I want using beautiful soup, but then I don't really know how to extract those lines of text. I can't use split() because I need to exactly maintain the white space in those three lines. How can this be done?

import urllib2
from bs4 import BeautifulSoup
import ephem
import datetime

nasaissurl = 'http://spaceflight.nasa.gov/realdata/sightings/SSapplications/Post/JavaSSOP/orbit/ISS/SVPOST.html'
soup = BeautifulSoup(urllib2.urlopen(nasaissurl), 'html.parser')
body = soup.find_all("pre")
index = 0
firstTLE = False
for tag in body:
    if "ISS" in tag.text:
        print tag.text

2 Answers 2

1

If you break the text into lines and process each line at a time, then you can rejoin the lines when you find the three lines you need like:

Code:

def process_tag_text(tag_text):
    marker = 'TWO LINE MEAN ELEMENT SET'
    text = iter(tag_text.split('\n'))
    for line in text:
        if marker in line:
            next(text)
            results.append('\n'.join(
                (next(text), next(text), next(text))))
    return results

Test Code:

import urllib2
from bs4 import BeautifulSoup

nasaissurl = 'http://spaceflight.nasa.gov/realdata/sightings/' \
             'SSapplications/Post/JavaSSOP/orbit/ISS/SVPOST.html'
soup = BeautifulSoup(urllib2.urlopen(nasaissurl), 'html.parser')
body = soup.find_all("pre")
results = []
for tag in body:
    if "ISS" in tag.text:
        results.extend(process_tag_text(tag.text))

print('\n'.join(results))

Results:

ISS
1 25544U 98067A   18054.51611082  .00016717  00000-0  10270-3 0  9009
2 25544  51.6368 225.3935 0003190 125.8429 234.3021 15.54140528 20837
ISS
1 25544U 98067A   18055.54493747  .00016717  00000-0  10270-3 0  9010
2 25544  51.6354 220.2641 0003197 130.5210 229.6221 15.54104949 20991
ISS
1 25544U 98067A   18056.50945749  .00016717  00000-0  10270-3 0  9022
2 25544  51.6372 215.4558 0003149 134.4837 225.6573 15.54146916 21143
ISS
1 25544U 98067A   18057.34537198  .00016717  00000-0  10270-3 0  9031
2 25544  51.6399 211.2932 0002593 130.2258 229.9121 15.54133048 21277
Sign up to request clarification or add additional context in comments.

Comments

1

You can achieve the same in several ways. Here is another approach:

from bs4 import BeautifulSoup
import requests

URL = "https://spaceflight.nasa.gov/realdata/sightings/SSapplications/Post/JavaSSOP/orbit/ISS/SVPOST.html"
soup = BeautifulSoup(requests.get(URL).text,"lxml")

for item in soup.select("pre"):
    for line in range(len(item.text.splitlines())):
        if "25544U" in item.text.splitlines()[line]:
            doc = item.text.splitlines()[line-1].strip()
            doc1 = item.text.splitlines()[line].strip()
            doc2 = item.text.splitlines()[line+1].strip()
            print("{}\n{}\n{}\n".format(doc,doc1,doc2))

Partial output:

ISS
1 25544U 98067A   18054.51611082  .00016717  00000-0  10270-3 0  9009
2 25544  51.6368 225.3935 0003190 125.8429 234.3021 15.54140528 20837

ISS
1 25544U 98067A   18055.54493747  .00016717  00000-0  10270-3 0  9010
2 25544  51.6354 220.2641 0003197 130.5210 229.6221 15.54104949 20991

ISS
1 25544U 98067A   18056.50945749  .00016717  00000-0  10270-3 0  9022
2 25544  51.6372 215.4558 0003149 134.4837 225.6573 15.54146916 21143

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.