Can't get text without tag using Selenium Python

Question

first of all, I'll show the code that I'm having problem to in order to better explain myself.

<div class="archivos"> ... </div>
<br>
<br>
<br>
<br>
THIS IS THE TEXT THAT I WANT TO CHECK
<div class="archivos"> ... </div>
...

I'm using Selenium in Python.

So, this is a piece of the html that I'm working with. My objective is, inside the div with "class=archivos", there's a link that i want to click, but for that, I need to first analyze the text that's over it to know if I want to click or not the link.

The problem is that there's no tag on the text, and I can't seem to find a way to copy it so I can search it for the information I want. The text changes every time so I need to locate the possible texts previous to every "class=archivos".

So far I've tried a lot of ways to find it using XPath mainly, trying to get to the previous element of the div. I haven't come with anything that works yet, as I'm not very experienced with Selenium and XPaths.

I've found this https://chercher.tech/python/relative-xpath-selenium-python,which helped me try some XPaths, and several responses here on SO but to no avail.

I've read somewhere that I can use Javascript code from Python using Selenium to get it, but I don't know Javascript and don't know how to do it. Maybe somebody understands what I'm talking about.

This is the webpage if it helps: http://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VERLST&DOCS=1-200&BASE=BOLE&SEC=FIRMA&SEPARADOR=&PUBL=20200901

Thanks in advance for the help, and I'll provide any further information if it's needed.

Well, yeah, why not! Didn't think about that possibility. Been doing the whole project with Selenium. Would there be any problem mixing both? On the other side, I've been learning to use Selenium and using Beautiful Soup would mean starting to learn it. — Alberefe
– Alberefe, Commented Sep 1, 2020 at 18:51

Andrej Kesely · Accepted Answer · 2020-09-01 18:56:36Z

Here is example how to extract the previous text with BeautifulSoup. I loaded the page with requests module, but you can feed the HTML source to BeautifulSoup from selenium:

import requests
from bs4 import BeautifulSoup


url = 'http://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VERLST&DOCS=1-200&BASE=BOLE&SEC=FIRMA&SEPARADOR=&PUBL=20200901'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for t in soup.select('.archivos'):
    previous_text = t.find_previous(text=True).strip()
    link = t.a['href']
    print(previous_text)
    print('http://www.boa.aragon.es' + link)
    print('-' * 80)

Prints:

ORDEN HAP/804/2020, de 17 de agosto, por la que se modifica la Relación de Puestos de Trabajo de los Departamentos de Industria, Competitividad y Desarrollo Empresarial y de Economía, Planificación y Empleo.
http://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VERDOC&BASE=BOLE&PIECE=BOLE&DOCS=1-22&DOCR=1&SEC=FIRMA&RNG=200&SEPARADOR=&&PUBL=20200901
--------------------------------------------------------------------------------
ORDEN HAP/805/2020, de 17 de agosto, por la que se modifica la Relación de Puestos de Trabajo del Departamento de Agricultura, Ganadería y Medio Ambiente.
http://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VERDOC&BASE=BOLE&PIECE=BOLE&DOCS=1-22&DOCR=2&SEC=FIRMA&RNG=200&SEPARADOR=&&PUBL=20200901
--------------------------------------------------------------------------------
ORDEN HAP/806/2020, de 17 de agosto, por la que se modifica la Relación de Puestos de Trabajo del Organismo Autónomo Instituto Aragonés de Servicios Sociales.
http://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VERDOC&BASE=BOLE&PIECE=BOLE&DOCS=1-22&DOCR=3&SEC=FIRMA&RNG=200&SEPARADOR=&&PUBL=20200901
--------------------------------------------------------------------------------
ORDEN ECD/807/2020, de 24 de agosto, por la que se aprueba el expediente relativo al procedimiento selectivo de acceso al Cuerpo de Catedráticos de Música y Artes Escénicas.
http://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VERDOC&BASE=BOLE&PIECE=BOLE&DOCS=1-22&DOCR=4&SEC=FIRMA&RNG=200&SEPARADOR=&&PUBL=20200901
--------------------------------------------------------------------------------
RESOLUCIÓN de 28 de julio de 2020, de la Dirección General de Justicia, por la que se convocan a concurso de traslado plazas vacantes entre funcionarios de los Cuerpos y Escalas de Gestión Procesal y Administrativa, Tramitación Procesal y
Administrativa y Auxilio Judicial de la Administración de Justicia.
http://www.boa.aragon.es/cgi-bin/EBOA/BRSCGI?CMD=VERDOC&BASE=BOLE&PIECE=BOLE&DOCS=1-22&DOCR=5&SEC=FIRMA&RNG=200&SEPARADOR=&&PUBL=20200901
--------------------------------------------------------------------------------

...and so on.

Collectives™ on Stack Overflow

Can't get text without tag using Selenium Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related