Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Filter by
Sorted by
Tagged with
0 votes
0 answers
66 views

nushell polars plugin panicking

I just created this function for making label-encoding: # Custom command to perform label encoding on a specified column of a table, it returns a polar dataframe @example "Simple example" { ...
kurokirasama's user avatar
3 votes
1 answer
115 views

Tidypolars - strange error with inequality join

I get a strange error with an inequality inner_join on datetime columns: library(polars) library(tidypolars) library(dplyr) library(lubridate) x <- tibble(id = c(1, 1, 2, 2), t = as....
Anand's user avatar
  • 103
1 vote
1 answer
93 views

Converting a Rust `futures::TryStream` to a `polars::LazyFrame`

I have an application where I have a futures::TryStream. Still in a streaming fashion, I want to convert this into a polars::LazyFrame. It is important to note that the TryStream comes from the ...
bmitc's user avatar
  • 908
0 votes
0 answers
100 views

PyCharm "view as DataFrame" shows nothing for polars DataFrames

Basically the title. Using PyCharm 2023.3.3 I'm not able to see the data of polars DataFrames. As an example, I've a simple DataFrame like this: print(ids_df) shape: (1, 4) ┌───────────────────────────...
Nauel's user avatar
  • 522
0 votes
1 answer
105 views

Show progress bar when reading files with globbing with polars

I have a folder with multiple Excel files. I'm reading all of them in a single polars DataFrame concatenated vertically using globbing: import polars as pl df = pl.read_excel("folder/*.xlsx")...
robertspierre's user avatar
3 votes
3 answers
128 views

Order of columns in a plotnine bar plot using a polars dataframe

I'm quite new to the packages polars and plotnine and have the following code: import polars as pl import polars.selectors as cs from plotnine import * df = pl.read_csv('https://raw.githubusercontent....
René's user avatar
  • 4,919
2 votes
1 answer
111 views

Horizontal cumulative sum + unnest bug in polars

When I use horizontal cumulative sum followed by unnest, a "literal" column is formed that stays in the schema even when dropped. Here is an example: import polars as pl def ...
Nicolò Cavalleri's user avatar
1 vote
1 answer
247 views

What is the most efficient way to check if a Polars LazyFrame has duplicates?

With the help of claude sonnet 4, I cooked up this function, which I hope does what I asked it to do. def has_duplicates_early_exit(df: pl.LazyFrame, subset: list[str]) -> bool: ""&...
Nicolò Cavalleri's user avatar
4 votes
1 answer
573 views

How to use the "is_in" function correctly?

In Polars 0.46.0 it works normally: let df = df!( "id" => [0, 1, 2, 3, 4], "col_1" => [1, 2, 3, 4, 5], "col_2" => [3, 4, 5, 6, 7], ) .unwrap(); dbg!(&...
Alex Avin's user avatar
1 vote
1 answer
178 views

Why does polars kept killing the python kernel when joining two lazy frames and collecting them?

I have one dataframe: bos_df_3 that has about a 30k+ rows and another, taxon_ranked_only, with 6 million when I tried to join them using: matching_df = ( pl.LazyFrame(bos_df_3) .join( other=...
Ryan's user avatar
  • 426
4 votes
1 answer
185 views

How to use Polars copy-on-write principle?

I come from C++ and R world and just started using Polars. This is a great library. I want to confirm my understanding of its copy-on-write principle: import polars as pl x = pl.DataFrame({'a': [1, 2, ...
user2961927's user avatar
  • 1,790
0 votes
2 answers
464 views

When reading a database table with polars, how do I avoid a SchemaError?

I have a large table_to_load in a database file my_database.db that I am trying to read into a Python program as a polars DataFrame. Here is the code that does the reading: import polars as pl conn =...
SapereAude's user avatar
1 vote
1 answer
213 views

Polars reading just one file from s3 with glob patterns

I have a s3 location in which I have a list of directories and each directory contains a csv named sample_file.csv. I am trying to read these files using a glob pattern in pl.read_csv but it is just ...
figs_and_nuts's user avatar
2 votes
1 answer
68 views

Polars `concat_arr` no longer takes in `pl.col` as parameter?

The following used to work: pl.concat_arr(pl.col("X[m]", "Y[m]", "Z[m]")).alias("Antenna_position[m]"), but now (polars 1.31.0) I get an error: Traceback (most ...
Jason Yao's user avatar
0 votes
0 answers
120 views

Trying and failing to encrypt CSV with polars_encryption

When I try to run the below function: import polars as pl from polars_encryption import encrypt, decrypt def crypt(csv_file: str, delim: str, password: str, output_file: str): """ ...
James McIntyre's user avatar
1 vote
2 answers
167 views

How to specify relevant columns with read_excel

As far as I can tell, the following MRE conforms to the relevant documentation: import polars df = polars.read_excel( "/Volumes/Spare/foo.xlsx", engine="calamine", ...
jackal's user avatar
  • 29.1k
2 votes
0 answers
101 views

Best way to trigger lazy evaluation in PySpark and Polars for benchmarking

I'm currently benchmarking PySpark vs the growing alternative Polars. Basically I'm writing various queries (aggregations, filtering, sorting etc.) and measure the execution time, RAM and CPU. I ...
Ernest P W's user avatar
1 vote
0 answers
113 views

Polars maximum length reached when rolling with list aggregation

I'm working with time series data of daily historic demands for a lot of stores/products with a dataframe size of around 27.000.000 rows and 82.000 individual time series (specified by id_store and ...
Daeron's user avatar
  • 121
1 vote
1 answer
83 views

How to join/map a polars dataframe to a dict? [duplicate]

I have a polars dataframe, and a dictionary. I want to map a column in the dataframe to the keys of the dictionary, and then add the corresponding values as a new column. import polars as pl my_dict =...
falsePockets's user avatar
  • 4,423
1 vote
0 answers
81 views

How to read delta table and get empty columns in df?

In my file I have : { "Car": { "Model": null, "Color": null, } } I use read_delta to read the file: df = df.read_delta(path) At the end, I have an empty df. ...
ninja_minida's user avatar
0 votes
1 answer
115 views

Is there a polars operation to apply a function over each pair of groups?

I have a polars data frame which could be generated like so: import polars as pl import numpy as np num_points = 10 group_count = 3 df = pl.DataFrame( { "group_id": np....
TOgy's user avatar
  • 3
1 vote
1 answer
117 views

Want to broadcast a NumPy array using `pl.lit()` in Polars

Goal I have a NumPy array true_direction = np.array([1,2,3]).reshape(1,3) which I want to insert into a Polars DataFrame; that is, repeat this array in every row of the DataFrame. What I have tried ...
Jason Yao's user avatar
6 votes
1 answer
132 views

Split a column of string into list of list

How could I split a column of string into list of list? Minimum example: import polars as pl pl.Config(fmt_table_cell_list_len=6, fmt_str_lengths=100) df = pl.DataFrame({'test': "A,B,C,1\nD,E,F,...
Baffin Chu's user avatar
0 votes
1 answer
198 views

polars read_csv_batched not working with GCS

I am trying to load a 1GB+ csv file from GCS and encountering memory issues. So I am trying to use read_csv_batched per Memory issues sorting larger than memory file with polars The documentation for ...
sicsmpr's user avatar
  • 55
1 vote
1 answer
203 views

How do I ensure that a Polars expression plugin properly uses multiple CPUs?

I'm writing a polars plugin that works, but never seems to use more than one CPU. The plugin's function is element-wise, and is marked as such in register_plugin_function. What might I need to do to ...
sclamons's user avatar
  • 103
2 votes
1 answer
173 views

Split an array in a polars dataframe into regular columns

I have a pl.DataFrame with an array column. I need to split the array columns into traditional columns while assigning headers at the same time. import polars as pl df = pl.DataFrame( { &...
Andi's user avatar
  • 5,167
0 votes
1 answer
180 views

Polars for Python, can I read parquet files with hive_partitioning when the directory structure and files have been manually written?

I manually created directory structures and wrote parquet files rather than used the partition_by parameter in the write_parquet() function of the python polars library because I want full control ...
Matt's user avatar
  • 7,316
1 vote
0 answers
112 views

In a polars dataframe i want to replace ints with a list of ints

I have a map which is a dict that takes a an int and maps into to a list of ints. I have a polars dataframe column where i would like each int to be replaced by the relevant vector from the map. ...
Jon Tofteskov's user avatar
3 votes
1 answer
108 views

Modify list of arrays in place

I have a df like: # /// script # requires-python = ">=3.13" # dependencies = [ # "polars", # ] # /// import polars as pl df = pl.DataFrame( { "points"...
DJDuque's user avatar
  • 954
0 votes
0 answers
161 views

How can I use Polars to read a Parquet file in small batches? BatchedParquetReader seems broken

I want to read a Parquet file in batches/chunks so I don’t have to have the whole file in RAM. It’s a large file like tens of Gigabytes. I tried BatchedParquetReader, but it still reads the entire ...
Chris's user avatar
  • 3
1 vote
1 answer
134 views

Get a grouped sum in polars, but keep all individual rows

I am breaking my head over this probably pretty simply question and I just can't find the answer anywhere. I want to create a new column with a grouped sum of another column, but I want to keep all ...
gernophil's user avatar
  • 627
1 vote
1 answer
55 views

Can you create multiple columns based on the same set of conditions in Polars?

Is it possible to do something like this in Polars? Like do you need a separate when.then.otherwise for each of the 4 new varialbles, or can you use struct to create multiple new variables from one ...
catquas's user avatar
  • 794
2 votes
1 answer
124 views

how to unnest struct columns without dropping empty structs with r-polars

I have a DataFrame that I need to separate columns when there are commas. The problem is when I have columns that are all null. In the example below, I need a DataFrame with the columns "mpg"...
user27247029's user avatar
2 votes
2 answers
275 views

Sort a polars dataframe based on an external list

Morning, I'm not sure if this can be achieved.. Let's say i have a polars dataframe with cols a, b (whatever). import polars as pl df = pl.from_repr(""" ┌─────┬─────┐ │ a ┆ b │ │ --...
Ghost's user avatar
  • 1,594
1 vote
0 answers
130 views

Howto efficiently apply a gufunc to a 2D region of a Polars DataFrame

Both Polars and Numba are fantastic libraries that complement each other pretty well. There are some limitations when using Numba-compiled functions in Polars: Arrow columns must be converted to ...
Olibarer's user avatar
  • 423
1 vote
2 answers
145 views

Rust-polars: unable to filter dataframe after renaming the column filtered

The following code runs: fn main() { let mut df = df! [ "names" => ["a", "b", "c", "d"], "values" => [1, 2, 3, 4], ...
Roger V.'s user avatar
  • 803
1 vote
1 answer
208 views

How to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...
pinpss's user avatar
  • 173
1 vote
1 answer
150 views

Using `is_in` in rust-polars

I am trying to subset a rust-polars dataframe by the names contained in a different frame: use polars::prelude::*; fn main() { let mut df = df! [ "names" => ["a", &...
Roger V.'s user avatar
  • 803
0 votes
0 answers
185 views

Polars out of core sorting and memory usage

From what I understand this is a main use case for Polars: being able to process a dataset that is larger than RAM, using disk space if necessary. Yet I am unable to achieve this in a Kubernetes ...
Nicolas Galler's user avatar
1 vote
1 answer
83 views

How to get all group in polars by rust?

In python, just like this df = pl.DataFrame({"foo": ["a", "a", "b"], "bar": [1, 2, 3]}) for name, data in df.group_by("foo"): print(...
Nyssance's user avatar
  • 401
0 votes
1 answer
111 views

Polars wheel file

The provided whl files for polars library are tagged as abi3. I am working with specific setup that needs ABI tag to be cp39. I tried unpacking and packing again while changing the tag but still not ...
RGI's user avatar
  • 21
0 votes
1 answer
141 views

polars date quarter parsing using strptime returns null

Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value: import polars as pl date_format = "%Y-Q%q-%d&...
sicsmpr's user avatar
  • 55
2 votes
1 answer
164 views

Polars Dataframe from nested dictionaries as columns

I have a dictionary of nested columns with the index as key in each one. When i try to convert it to a polars dataframe, it fetches the column names and the values right, but each column has just one ...
Ghost's user avatar
  • 1,594
0 votes
0 answers
122 views

Polars method in GCP

Sorry if this is a 'too general/wide' question. I'm having some issues when trying to execute any polars DataFrame.filter() line in Google Cloud python Functions. The code doesn't crash, it just ...
Ghost's user avatar
  • 1,594
-5 votes
1 answer
1k views

Filtering polars dataframe by row with boolean mask [closed]

I'm trying to filter a Polars dataframe by using a boolean mask for the rows, which is generated from conditions on an specific column using: df = df[df['col'] == cond] And it's giving me an error ...
Ghost's user avatar
  • 1,594
2 votes
0 answers
183 views

Why does a Parquet file written with Polars query faster than one written with Spark?

I am writing Parquet files using two different frameworks—Apache Spark (Scala) and Polars (Python)—with the same schema and data. However, when I query the resulting Parquet files using Apache ...
user29976558's user avatar
0 votes
0 answers
205 views

Values differ on multiple reads from parquet files using polars read_parquet but not with pandas read_parquet by workstation

I read data from the same parquet files multiple times using polars (polars rust engine and pyarrow) and using pandas pyarrow backend (not fastparquet as it was very slow), see below code. All the ...
newandlost's user avatar
  • 1,080
0 votes
1 answer
81 views

Efficient (and Safe) Way of Accessing Larger-than-Memory Datasets in Parallel

I am trying to use polars~=1.24.0 on Python 3.13 to process larger-than-memory sized datasets. Specifically, I am loading many (i.e., 35 of them) parquet files via the polars.scan_parquet('base-name-*....
Arda Aytekin's user avatar
  • 1,303
2 votes
2 answers
149 views

Create a uniform dataset in Polars with cross joins

I am working with Polars and need to ensure that my dataset contains all possible combinations of unique values in certain index columns. If a combination is missing in the original data, it should be ...
Olibarer's user avatar
  • 423
0 votes
1 answer
181 views

How to serialize json from a polars df in rust

I want get json from polars dataframe, follow this answer Serialize Polars `dataframe` to `serde_json::Value` use polars::prelude::*; fn main() { let df = df! { "a" => [1,2,3,...
Nyssance's user avatar
  • 401