0

I am trying to read the tables of each individual html file in a folder using pandas, to find out the number of tables in each file.

However, this feature works when specifying a single file, but when I try to run it in the folder it says there are no tables.

This is the code for the single file

import pandas as pd


file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)

print ('tables found:', len(table))

This is the output

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe C:/Users/Ahmed_Abdelmuniem/PycharmProjects/PandaHTML/main.py
tables found: 72

Process finished with exit code 0

This is the code for each file in a folder

import pandas as pd
import shutil
import os

source_dir = r'C:\Users\Ahmed_Abdelmuniem\Desktop\TMorning'
target_dir = r'C:\Users\Ahmed_Abdelmuniem\Desktop\TAfternoon'

file_names = os.listdir(source_dir)

for file_name in file_names:
    table = pd.read_html(file_name)
    print ('tables found:', len(table))

This is the error log:

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe "C:/Users/Ahmed_Abdelmuniem/PycharmProjects/File mover V2.0/main.py"
Traceback (most recent call last):
  File "C:\Users\Ahmed_Abdelmuniem\PycharmProjects\File mover V2.0\main.py", line 12, in <module>
    table = pd.read_html(file_name)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 1085, in read_html
    return _parse(
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 913, in _parse
    raise retained
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 893, in _parse
    tables = p.parse_tables()
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 213, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\html.py", line 543, in _parse_tables
    raise ValueError("No tables found")
ValueError: No tables found

Process finished with exit code 1
0

1 Answer 1

1

os.listdir returns a list containing the names of the entries in the directory including subdirectories or any other files. If you want to keep only html files, prefer use glob.glob.

import glob

file_names = glob.glob(os.path.join(source_dir, '*.html'))

Edit: if you want to use os.listdir, you have to get the actual path to the file:

for file_name in file_names:
    table = pd.read_html(os.path.join(source_dir, file_name))
    print ('tables found:', len(table))
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a million bud, it worked. But why didn't it work if the only files are html files?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.