1

I have a map which is a dict that takes a an int and maps into to a list of ints. I have a polars dataframe column where i would like each int to be replaced by the relevant vector from the map. Should there be some ints that are not in the map, its should be replaced by a list of zeros instead.

I have the following code, with some example data

import polars as pl

data = {
    "user_id": [1, 2, 3],
    "book_ids": [[101, 102, 103], [104, 105], [106]]
}

# Create DataFrame
read_history_data = pl.DataFrame(data)

# Mapping dictionary
map = {
    101: [1, 2],
    102: [3, 4],
    103: [5, 6],
    104: [7, 8],
    105: [9, 10],

    106: [11, 12]
}

# Padding value and token length
padding_value = 0
token_length = 4

# Column name
column = "book_ids"

# Function to transform the DataFrame
def transform_read_history_data(read_history_data, map, padding_value, token_length, column):
    padded_list = [padding_value for i in range(token_length)]
    read_history_data = read_history_data.with_columns(
        pl.col(column)
        .list.eval(pl.element().replace(map, default=None))
        .list.eval(pl.element().fill_null(padded_list))
    )
    return read_history_data

# Run the function
transformed_data = transform_read_history_data(read_history_data, map, padding_value, token_length, column)

# Print the transformed DataFrame
print(transformed_data)

I get:

Traceback (most recent call last):
  File "<string>", line 37, in <module>
  File "<string>", line 29, in transform_read_history_data
  File "c:\...\.venv\Lib\site-packages\polars\dataframe\frame.py", line 9830, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:...\.venv\Lib\site-packages\polars\_utils\deprecation.py", line 93, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\...\.venv\Lib\site-packages\polars\lazyframe\frame.py", line 2224, in collect
    return wrap_df(ldf.collect(engine, callback))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ShapeError: argument 2 called 'new' for replace_strict have different lengths (6 != 3)
1
  • 1
    I think you are running into this issue - github.com/pola-rs/polars/issues/22554 This has been fixed on the main branch so will be fixed in the next release. The solution for now is to downgrade from 1.29.0 to 1.28.1. Also FYI you should replace_strict if wanting to specify a default. replace with a default provided should be emitting a deprecation warning Commented May 6 at 11:23

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.