0

Note: This is not a duplicate question as I have gone through this answer and made the necessary package downgrade but it still results in the same error. Details below.

# System Details

  • MacBook Air (M1, 2020)
  • MacOS Monterey 12.3
  • Python 3.10.8 (Miniconda environment)
  • Relevant library versions from pip freeze
importlib-metadata==3.4.0
PyMuPDF==1.21.1
spacy==3.4.4
spacy-alignments==0.9.0
spacy-legacy==3.0.11
spacy-loggers==1.0.4
spacy-transformers==1.2.0
streamlit==1.17.0
flair==0.11.3
catalogue==2.0.8

# Setup

  • I am trying to use Spacy for some text processing over a pdf document uploaded to a Streamlit app.
  • The Streamlit app basically contains an upload button, submit button (which calls the preprocessing and spacy functions), and a text_area to display the processed text.

Here is the working code for uploading a pdf document and extracting its text -

import streamlit as st
import fitz

def load_file(file):
    doc = fitz.open(stream=uploaded_file.read(), filetype="pdf")    
    text = []
    with doc:
        for page in doc:
            text.append(page.get_text())
        text = "\n".join(text)
    return text

#####################################################################   

st.title("Test app")

col1, col2 = st.columns([1,1], gap='small')

with col1:
    with st.expander("Description -", expanded=True):
        st.write("This is the description of the app.")
    
with col2:
    with st.form(key="my_form"):
        uploaded_file = st.file_uploader("Upload",type='pdf', accept_multiple_files=False, label_visibility="collapsed")
        submit_button = st.form_submit_button(label="Process")        

#####################################################################        
        
col1, col2 = st.columns([1,3], gap='small')

with col1:
    st.header("Metrics")

with col2:
    st.header("Text")
    
    if uploaded_file is not None:
        text = load_file(uploaded_file)
        st.text_area(text)

# Reproduce base code

  • install necessary libraries
  • save above code to a test.py file
  • from terminal navigate to folder and run streamlit run test.py
  • navigate to http://localhost:8501/ in browser
  • download this sample pdf and upload it to the app as an example

This results in a functioning app -

enter image description here

# Issue I am facing

Now, the issue comes when I add spacy to the python file using import spacy and rerun the streamlit app, this error pops up -

AttributeError: 'PathDistribution' object has no attribute '_normalized_name'
Traceback:
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
File "/Users/akshay_sehgal/Library/CloudStorage/________/Documents/Code/Demo UI/Streamlit/keyphrase_extraction_template/test.py", line 3, in <module>
    import spacy
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/__init__.py", line 6, in <module>
    from .errors import setup_default_warnings
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/errors.py", line 2, in <module>
    from .compat import Literal
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/compat.py", line 3, in <module>
    from thinc.util import copy_array
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/thinc/__init__.py", line 5, in <module>
    from .config import registry
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/thinc/config.py", line 1, in <module>
    import catalogue
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/catalogue/__init__.py", line 20, in <module>
    AVAILABLE_ENTRY_POINTS = importlib_metadata.entry_points()  # type: ignore
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 1009, in entry_points
    return SelectableGroups.load(eps).select(**params)
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 459, in load
    ordered = sorted(eps, key=by_group)
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 1006, in <genexpr>
    eps = itertools.chain.from_iterable(
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/_itertools.py", line 16, in unique_everseen
    k = key(element)

# What have I tried?

  1. First thing I tried was to isolate the spacy code and run it in a notebook in the specific environment, which worked without any issue.
  2. Next, after researching SO (this answer) and the github issues, I found that importlib.metadata could be the potential culprit and therefore I downgraded this using the following code, but it didn't fix anything.
pip uninstall importlib-metadata
pip install importlib-metadata==3.4.0
  1. I removed the complete environment, and setup the whole thing again, from scratch, following the same steps I used the first time (just in case I had made some mistake during its setup). But still the same error.

  2. Final option I would be left with, is to containerize the spacy processing as an API, and then call it via the streamlit app using requests

I would be happy to share the requirements.txt if needed, but I will have to figure out how to upload it somewhere via my office pc. Do let me know if that is required and I will find a way.

Would appreciate any help in solving this issue!

1 Answer 1

0

Upgrade importlib-metadata to importlib-metadata>=4.3.0 to avoid this particular error.

There can be complicated interactions between the built-in importlib.metadata and the additional importlib_metadata package, and you need a newer version of importlib-metadata to get some of the updates/fixes related to this.

With python 3.10 and importlib-metadata==3.4.0, you can see this error with the following example (spacy and streamlit are not required):

import importlib_metadata
import importlib.metadata
importlib.metadata.entry_points()
Sign up to request clarification or add additional context in comments.

1 Comment

let me try upgrading it. For now I have setup a separate API for the spacy component which I am calling from the streamlit, which works as intended.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.