1

I have a small program file, here is the relevant code:

import numpy as np
import pandas as pd
from docx import Document


####    Setup the file names, also make provisions for having the user select the file   ####
SHRD_filename = "SHRD - SVN 12485.docx"
SHDD_filename = "SHDD - SVN 12485.doc"
#SHRD_name = PCB_utility.get_file('Select SHRD file')
#SHDD_name = PCB_utility.get_file('Select SHDD file')

data = []
keys = {}

document_SHRD = Document(SHRD_filename)
tables_SHRD = document_SHRD.tables[30]
for i, row in enumerate(tables_SHRD.rows):
    text = (cell.text for cell in row.cells)
    if i == 0:
        keys = tuple(text)
        continue

    row_data = dict(zip(keys, text))
    data.append(row_data)

df_SHRD = pd.DataFrame.from_dict(data)
#cols = df_SHRD.columns.tolist()

print(df_SHRD.tail(20))

s = df_SHRD['HLR Trace Tag'].str.split('  ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1)
s.name = 'HLR Tags'
del df_SHRD['HLR Trace Tag']

df_SHRD.join(s)

When I initially make the dataframe, it looks like this:

300  HLR-0000094  HLR-0000095  HLR-0000340   LRU-0000440
301  HLR-0000094  HLR-0000095  HLR-0000341   LRU-0000441
302  HLR-0000094  HLR-0000095  HLR-0000342   LRU-0000442
303                            HLR-0000675   LRU-0000745
304                            HLR-0000676   LRU-0000746
305                            HLR-0000677   LRU-0000747
306                            HLR-0000678   LRU-0000748
307                            HLR-0000679   LRU-0000749
308                            HLR-0000680   LRU-0000750

I need to split the HLR tags into individual rows. At the end of my program it comes back as this:

300   LRU-0000440
301   LRU-0000441
302   LRU-0000442
303   LRU-0000745
304   LRU-0000746
305   LRU-0000747
306   LRU-0000748
307   LRU-0000749
308   LRU-0000750

But when I retype:

In [25]:df_SHRD.join(s)
Out[25]: 
300   LRU-0000440  HLR-0000094
300   LRU-0000440  HLR-0000095
300   LRU-0000440  HLR-0000340
301   LRU-0000441  HLR-0000094
301   LRU-0000441  HLR-0000095
301   LRU-0000441  HLR-0000341
302   LRU-0000442  HLR-0000094
302   LRU-0000442  HLR-0000095
302   LRU-0000442  HLR-0000342
303   LRU-0000745  HLR-0000675
304   LRU-0000746  HLR-0000676
305   LRU-0000747  HLR-0000677
306   LRU-0000748  HLR-0000678
307   LRU-0000749  HLR-0000679
308   LRU-0000750  HLR-0000680

[457 rows x 2 columns]

Any help would be appreciated on why the command works in the IPython window but not in the script.

1 Answer 1

1

DataFrame.join(other, ...)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

Returns: joined : DataFrame

  1. join is not an inplace operation. It returns a result that must be assigned back to another variable if you want to store the result.

    df = df_SHRD.join(s)
    
  2. IPython displays results when printing variables without the print call, while running through a script does not. This is because of IPython's REPL nature. In either case, you must assign the result back. Try printing df_SHRD.join(s) followed by df_SHRD in IPython, and you'll see.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.