I have a moderate-sized table (nearly 1200 rows) of data in Excel with fields containing either str values or datetime.date values that I wish to analyze with Python. I read the Excel data into a Pandas dataframe (df) and then use dataframe queries (df.query(query string)) to parse out the specific data in which I'm interested. A sample of the data table is provided below, showing the 4/9 columns most relevant to the queries.
BF Ship … Commissioned Decommissioned
X Able … 22-Jul-1992 30-Sep-2030
X Abraham Lincoln … 11-Nov-1989
X Abraham Lincoln … 11-Mar-1961 28-Feb-1981
X Acadia … 6-Jun-1981 16-Dec-1994
X Adroit … 4-Mar-1957 12-Dec-1991
X Adventurous … 19-Aug-1988 5-Jun-1992
X Affray … 8-Dec-1958 20-Dec-1992
X Ainsworth … 31-Mar-1973 27-May-1994
X Ajax … 30-Sep-1943 31-Dec-1986
X Alabama … 25-May-1985 30-Sep-2028
X Alamo … 24-Aug-1956 28-Sep-1990
X Alan Shepard … 26-Jun-2007
X Alaska … 25-Jan-1986 1-Jul-2025
X Albany … 7-Apr-1990 30-Sep-2028
X Albert David … 19-Oct-1968 28-Sep-1988
X Albuquerque … 21-May-1983 16-Sep-2015
X Alexandria … 29-Jun-1991 30-Sep-2026
X Algol … 1-Oct-1981 1-Oct-2007
X Altair … 5-Feb-1982 1-Oct-2007
X Amelia Earhart … 30-Oct-2008
X America … 11-Oct-2014
The code correctly reads the spreadsheet data into a Pandas dataframe object with the correct data type for each column (either string or datetime). I defined a datetime variable now that stores the current date and use that to query the number of active ships in the Battle Force (BF). That query worked perfectly:
qrystr = "BF == 'X' "\
"and (Commissioned != 'NaT' and Commissioned <= 'now')"\
"and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
df3 = df.query(qrystr)
print("\nNumber of Battle Force ships is: ", len(df3))
A subsequent query was developed to list only the currently active Cruisers in the fleet as shown below. This query also operated correctly.
qrystr = "Type == 'Cruiser' "\
"and (Commissioned != 'NaT' and Commissioned <= 'now')"\
"and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
dfCruisers = df3.query(qrystr)
print ("Currently active cruisers are:\n", dfCruisers)
This leads to the part where my dataframe query algorithm fails. I built a loop to count the number of active cruisers in the fleet from 1980 through 2030. I use my loop control variable to represent each year and then build a dataframe query to filter the full list of ships to a new dataframe list with just the ships that were still in commission during the loop control year. I use the len() function to count that number and then append the results into lstCruisers. Unfortunately, I get an unexpected error when it tries to execute the query. Here is the source code and a printout of the results. Can someone help me understand why the datetime variable isn't being handled correctly in the last df.query?
import datetime as dt
import pandas as pd
now = dt.date.today()
print(now, type(now))
df = pd.read_excel("Ships.xlsx")
print("\n", df.dtypes)
print("df:\n", df)
# Calculate number of current Battle Force ships
qrystr = "BF == 'X' "\
"and (Commissioned != 'NaT' and Commissioned <= 'now')"\
"and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
df3 = df.query(qrystr)
print("\nNumber of Battle Force ships is: ", len(df3))
qrystr = "Type == 'Cruiser' "\
"and (Commissioned != 'NaT' and Commissioned <= 'now')"\
"and (Decommissioned >= 'now' or Decommissioned == 'NaT') "
dfCruisers = df3.query(qrystr)
print ("Currently active cruisers are:\n", dfCruisers)
lstCruisers = []
yearStart = 1980
yearStop = 1985
for yr in range(yearStart, yearStop+1):
stopDate = pd.Timestamp(yr, 12, 31).date()
print(stopDate, type(stopDate))
qrystr = "Type == 'Cruiser' " \
" and (Commissioned != 'NaT' and Commissioned <= 'stopDate')" \
" and (Decommissioned >= 'stopDate' or Decommissioned == 'NaT') "
dfCruisers = df.query(qrystr)
print(dfCruisers)
nrShips = len(dfCruisers)
lstCruisers.append([yr, nrShips])
print(lstCruisers)
/usr/bin/python3/PycharmProjects/pythonProject3/Ships.py
2023-12-12 <class 'datetime.date'>
BF object
Ship object
Trigraph object
Hull Number object
HullType object
Type object
Cmd Level object
Commissioned datetime64[ns]
Decommissioned datetime64[ns]
dtype: object
df:
BF Ship Trigraph ... Cmd Level Commissioned Decommissioned
0 X (Unnamed) NaN ... CAPT-SWO 2029-07-01 NaT
1 X (Unnamed) NaN ... CAPT-SWO 2030-07-01 NaT
2 X (Unnamed) NaN ... Sequential-Air 2036-12-31 NaT
3 X (Unnamed) NaN ... Sequential-Air 2040-12-31 NaT
4 X (Unnamed) NaN ... CAPT-SWO 2029-07-01 NaT
... ... ... ... ... ... ... ...
1152 X Yuma YMA ... NaN 2017-04-21 NaT
1153 NaN Zephyr ZPR ... LCDR-SWO 2011-09-30 2021-02-17
1154 NaN Zephyr ZPR ... LCDR-SWO 1994-10-15 2004-10-01
1155 X Zeus ZEU ... NaN NaT NaT
1156 X Zumwalt ZUM ... CDR-SWO 2016-10-15 NaT
[1157 rows x 9 columns]
Number of Battle Force ships is: 290
Currently active cruisers are:
BF Ship Trigraph ... Cmd Level Commissioned Decommissioned
80 X Antietam ANT ... CAPT-SWO 1987-06-06 2024-09-30
196 X Cape St. George CSG ... CAPT-SWO 1993-06-12 2027-09-30
231 X Chosin CHO ... CAPT-SWO 1991-01-12 2027-09-30
275 X Cowpens COW ... CAPT-SWO 1991-03-09 2026-09-30
407 X Gettysburg GET ... CAPT-SWO 1991-06-22 2026-09-30
605 X Lake Erie LKE ... CAPT-SWO 1993-05-10 2025-09-30
623 X Leyte Gulf LTG ... CAPT-SWO 1987-09-26 2024-09-30
746 X Normandy NOR ... CAPT-SWO 1989-12-09 2025-09-30
806 X Philippine Sea PSE ... CAPT-SWO 1989-03-18 2025-09-30
841 X Princeton PRN ... CAPT-SWO 1989-02-11 2026-09-30
893 X Robert Smalls NaN ... CAPT-SWO 2023-03-04 2026-07-01
966 X Shiloh SHI ... CAPT-SWO 1992-07-18 2024-09-30
1095 X Vicksburg VIK ... CAPT-SWO 1992-11-14 2023-12-31
[13 rows x 9 columns]
1980-12-31 <class 'datetime.date'>
Traceback (most recent call last):
File "pandas/_libs/tslibs/conversion.pyx", line 530, in pandas._libs.tslibs.conversion._convert_str_to_tsobject
File "pandas/_libs/tslibs/parsing.pyx", line 318, in pandas._libs.tslibs.parsing.parse_datetime_string
File "/Library/Python/3.8/site-packages/dateutil/parser/_parser.py", line 1368, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/Library/Python/3.8/site-packages/dateutil/parser/_parser.py", line 643, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: stopDate
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/PycharmProjects/pythonProject3/Ships.py", line 55, in <module>
dfCruisers = df.query(qrystr)
File "/Library/Python/3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/frame.py", line 4474, in query
res = self.eval(expr, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/frame.py", line 4612, in eval
return _eval(expr, inplace=inplace, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/eval.py", line 353, in eval
parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 813, in __init__
self.terms = self.parse()
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 832, in parse
return self._visitor.visit(self.expr)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
return visitor(node, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 421, in visit_Module
return self.visit(expr, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
return visitor(node, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 424, in visit_Expr
return self.visit(node.value, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
return visitor(node, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 750, in visit_BoolOp
return reduce(visitor, operands)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 744, in visitor
rhs = self._try_visit_binop(y)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 739, in _try_visit_binop
return self.visit(bop)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
return visitor(node, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 750, in visit_BoolOp
return reduce(visitor, operands)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 744, in visitor
rhs = self._try_visit_binop(y)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 739, in _try_visit_binop
return self.visit(bop)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
return visitor(node, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 723, in visit_Compare
return self.visit(binop)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 415, in visit
return visitor(node, **kwargs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 538, in visit_BinOp
return self._maybe_evaluate_binop(op, op_class, left, right)
File "/Library/Python/3.8/site-packages/pandas/core/computation/expr.py", line 505, in _maybe_evaluate_binop
res = op(lhs, rhs)
File "/Library/Python/3.8/site-packages/pandas/core/computation/ops.py", line 379, in __init__
self.convert_values()
File "/Library/Python/3.8/site-packages/pandas/core/computation/ops.py", line 477, in convert_values
v = Timestamp(ensure_decoded(v))
File "pandas/_libs/tslibs/timestamps.pyx", line 1698, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas/_libs/tslibs/conversion.pyx", line 249, in pandas._libs.tslibs.conversion.convert_to_tsobject
File "pandas/_libs/tslibs/conversion.pyx", line 533, in pandas._libs.tslibs.conversion._convert_str_to_tsobject
ValueError: could not convert string to Timestamp
Process finished with exit code 1
'now'didn't fail, comparisons against datetime involve an implicit conversion, i.e. it gets converted topd.Timestamp('now').'NaT'is a string. To check if a Pandas value is null, use.isna()or.notna().