0

I have a moderate-sized table (nearly 1200 rows) of data in Excel with fields containing either str values or datetime.date values that I wish to analyze with Python. I read the Excel data into a Pandas dataframe (df) and then use dataframe queries (df.query(query string)) to parse out the specific data in which I'm interested. A sample of the data table is provided below, showing the 4/9 columns most relevant to the queries.

BF  Ship            …   Commissioned    Decommissioned
X   Able            …   22-Jul-1992     30-Sep-2030
X   Abraham Lincoln …   11-Nov-1989  
X   Abraham Lincoln …   11-Mar-1961     28-Feb-1981
X   Acadia          …   6-Jun-1981      16-Dec-1994
X   Adroit          …   4-Mar-1957      12-Dec-1991
X   Adventurous     …   19-Aug-1988      5-Jun-1992
X   Affray          …   8-Dec-1958      20-Dec-1992
X   Ainsworth       …   31-Mar-1973     27-May-1994
X   Ajax            …   30-Sep-1943     31-Dec-1986
X   Alabama         …   25-May-1985     30-Sep-2028
X   Alamo           …   24-Aug-1956     28-Sep-1990
X   Alan Shepard    …   26-Jun-2007  
X   Alaska          …   25-Jan-1986      1-Jul-2025
X   Albany          …   7-Apr-1990      30-Sep-2028
X   Albert David    …   19-Oct-1968     28-Sep-1988
X   Albuquerque     …   21-May-1983     16-Sep-2015
X   Alexandria      …   29-Jun-1991     30-Sep-2026
X   Algol           …   1-Oct-1981       1-Oct-2007
X   Altair          …   5-Feb-1982       1-Oct-2007
X   Amelia Earhart  …   30-Oct-2008  
X   America         …   11-Oct-2014  

The code correctly reads the spreadsheet data into a Pandas dataframe object with the correct data type for each column (either string or datetime). I defined a datetime variable now that stores the current date and use that to query the number of active ships in the Battle Force (BF). That query worked perfectly:

qrystr = "BF == 'X' "\
         "and (Commissioned != 'NaT' and Commissioned <= 'now')"\
         "and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
df3 = df.query(qrystr)
print("\nNumber of Battle Force ships is: ", len(df3))

A subsequent query was developed to list only the currently active Cruisers in the fleet as shown below. This query also operated correctly.

qrystr = "Type == 'Cruiser' "\
         "and (Commissioned != 'NaT' and Commissioned <= 'now')"\
         "and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
dfCruisers = df3.query(qrystr)
print ("Currently active cruisers are:\n", dfCruisers)

This leads to the part where my dataframe query algorithm fails. I built a loop to count the number of active cruisers in the fleet from 1980 through 2030. I use my loop control variable to represent each year and then build a dataframe query to filter the full list of ships to a new dataframe list with just the ships that were still in commission during the loop control year. I use the len() function to count that number and then append the results into lstCruisers. Unfortunately, I get an unexpected error when it tries to execute the query. Here is the source code and a printout of the results. Can someone help me understand why the datetime variable isn't being handled correctly in the last df.query?

import datetime as dt
import pandas as pd

now = dt.date.today()
print(now, type(now))

df = pd.read_excel("Ships.xlsx")
print("\n", df.dtypes)

print("df:\n", df)

# Calculate number of current Battle Force ships
qrystr = "BF == 'X' "\
         "and (Commissioned != 'NaT' and Commissioned <= 'now')"\
         "and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
df3 = df.query(qrystr)
print("\nNumber of Battle Force ships is: ", len(df3))

qrystr = "Type == 'Cruiser' "\
         "and (Commissioned != 'NaT' and Commissioned <= 'now')"\
         "and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
dfCruisers = df3.query(qrystr)
print ("Currently active cruisers are:\n", dfCruisers)

lstCruisers = []
yearStart = 1980
yearStop = 1985
for yr in range(yearStart, yearStop+1):
    stopDate = pd.Timestamp(yr, 12, 31).date()
    print(stopDate, type(stopDate))

    qrystr = "Type == 'Cruiser' " \
             " and (Commissioned != 'NaT' and Commissioned <= 'stopDate')" \
             " and (Decommissioned >= 'stopDate' or Decommissioned == 'NaT') "
    dfCruisers = df.query(qrystr)
    print(dfCruisers)

    nrShips = len(dfCruisers)
    lstCruisers.append([yr, nrShips])

print(lstCruisers)

/usr/bin/python3/PycharmProjects/pythonProject3/Ships.py 

2023-12-12 <class 'datetime.date'>

BF                        object
Ship                      object
Trigraph                  object
Hull Number               object
HullType                  object
Type                      object
Cmd Level                 object
Commissioned      datetime64[ns]
Decommissioned    datetime64[ns]

dtype: object
df:
        BF       Ship Trigraph  ...       Cmd Level Commissioned Decommissioned
0       X  (Unnamed)      NaN  ...        CAPT-SWO   2029-07-01            NaT
1       X  (Unnamed)      NaN  ...        CAPT-SWO   2030-07-01            NaT
2       X  (Unnamed)      NaN  ...  Sequential-Air   2036-12-31            NaT
3       X  (Unnamed)      NaN  ...  Sequential-Air   2040-12-31            NaT
4       X  (Unnamed)      NaN  ...        CAPT-SWO   2029-07-01            NaT
...   ...        ...      ...  ...             ...          ...            ...
1152    X      Yuma       YMA  ...             NaN   2017-04-21            NaT
1153  NaN     Zephyr      ZPR  ...        LCDR-SWO   2011-09-30     2021-02-17
1154  NaN     Zephyr      ZPR  ...        LCDR-SWO   1994-10-15     2004-10-01
1155    X      Zeus       ZEU  ...             NaN          NaT            NaT
1156    X    Zumwalt      ZUM  ...         CDR-SWO   2016-10-15            NaT

[1157 rows x 9 columns]

Number of Battle Force ships is:  290
Currently active cruisers are:
      BF             Ship Trigraph  ... Cmd Level Commissioned Decommissioned
80    X         Antietam      ANT  ...  CAPT-SWO   1987-06-06     2024-09-30
196   X  Cape St. George      CSG  ...  CAPT-SWO   1993-06-12     2027-09-30
231   X           Chosin      CHO  ...  CAPT-SWO   1991-01-12     2027-09-30
275   X          Cowpens      COW  ...  CAPT-SWO   1991-03-09     2026-09-30
407   X       Gettysburg      GET  ...  CAPT-SWO   1991-06-22     2026-09-30
605   X        Lake Erie      LKE  ...  CAPT-SWO   1993-05-10     2025-09-30
623   X       Leyte Gulf      LTG  ...  CAPT-SWO   1987-09-26     2024-09-30
746   X         Normandy      NOR  ...  CAPT-SWO   1989-12-09     2025-09-30
806   X   Philippine Sea      PSE  ...  CAPT-SWO   1989-03-18     2025-09-30
841   X        Princeton      PRN  ...  CAPT-SWO   1989-02-11     2026-09-30
893   X    Robert Smalls      NaN  ...  CAPT-SWO   2023-03-04     2026-07-01
966   X           Shiloh      SHI  ...  CAPT-SWO   1992-07-18     2024-09-30
1095  X        Vicksburg      VIK  ...  CAPT-SWO   1992-11-14     2023-12-31

[13 rows x 9 columns]
1980-12-31 <class 'datetime.date'>
Traceback (most recent call last):
  File "pandas/_libs/tslibs/conversion.pyx", line 530, in pandas._libs.tslibs.conversion._convert_str_to_tsobject
  File "pandas/_libs/tslibs/parsing.pyx", line 318, in pandas._libs.tslibs.parsing.parse_datetime_string
  File "/Library/Python/3.8/site-packages/dateutil/parser/_parser.py", line 1368, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/Library/Python/3.8/site-packages/dateutil/parser/_parser.py", line 643, in parse
    raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: stopDate

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/PycharmProjects/pythonProject3/Ships.py", line 55, in <module>
    dfCruisers = df.query(qrystr)
  File "/Library/Python/3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/frame.py", line 4474, in query
    res = self.eval(expr, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/frame.py", line 4612, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/eval.py", line 353, in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 813, in __init__
    self.terms = self.parse()
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 832, in parse
    return self._visitor.visit(self.expr)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 421, in visit_Module
    return self.visit(expr, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 424, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 750, in visit_BoolOp
    return reduce(visitor, operands)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 744, in visitor
    rhs = self._try_visit_binop(y)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 739, in _try_visit_binop
    return self.visit(bop)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 750, in visit_BoolOp
    return reduce(visitor, operands)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 744, in visitor
    rhs = self._try_visit_binop(y)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 739, in _try_visit_binop
    return self.visit(bop)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 723, in visit_Compare
    return self.visit(binop)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
    return visitor(node, **kwargs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 538, in visit_BinOp
    return self._maybe_evaluate_binop(op, op_class, left, right)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 505, in _maybe_evaluate_binop
    res = op(lhs, rhs)
  File "/Library/Python/3.8/site-packages/pandas/core/computation/ops.py", line 379, in __init__
    self.convert_values()
  File "/Library/Python/3.8/site-packages/pandas/core/computation/ops.py", line 477, in convert_values
    v = Timestamp(ensure_decoded(v))
  File "pandas/_libs/tslibs/timestamps.pyx", line 1698, in pandas._libs.tslibs.timestamps.Timestamp.__new__
  File "pandas/_libs/tslibs/conversion.pyx", line 249, in pandas._libs.tslibs.conversion.convert_to_tsobject
  File "pandas/_libs/tslibs/conversion.pyx", line 533, in pandas._libs.tslibs.conversion._convert_str_to_tsobject
ValueError: could not convert string to Timestamp

Process finished with exit code 1
3
  • As for why 'now' didn't fail, comparisons against datetime involve an implicit conversion, i.e. it gets converted to pd.Timestamp('now'). Commented Dec 12, 2023 at 16:30
  • BTW, 'NaT' is a string. To check if a Pandas value is null, use .isna() or .notna(). Commented Dec 12, 2023 at 16:33
  • Also BTW, a lot of the detail here isn't relevant to the problem so could be removed; see minimal reproducible example and for specifics, How to make good reproducible pandas examples. See also How to Ask for tips like how to write a good title. Commented Dec 12, 2023 at 16:38

1 Answer 1

0

The query fails because you did not reference the stopDate variable correctly, you can do it using @ :

qrystr = "Type == 'Cruiser' " \
         " and (Commissioned != 'NaT' and Commissioned <= @stopDate)" \
         " and (Decommissioned >= @stopDate or Decommissioned == 'NaT') "
Sign up to request clarification or add additional context in comments.

1 Comment

Awesome! That worked perfectly! I'm self-learning Python and haven't run across the @ reference you identified for me. Thanks so much!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.