Newest 'polars' Questions

0 votes

0 answers

66 views

nushell polars plugin panicking

I just created this function for making label-encoding: # Custom command to perform label encoding on a specified column of a table, it returns a polar dataframe @example "Simple example" { ...

kurokirasama

787

asked Oct 10 at 19:24

3 votes

1 answer

115 views

Tidypolars - strange error with inequality join

I get a strange error with an inequality inner_join on datetime columns: library(polars) library(tidypolars) library(dplyr) library(lubridate) x <- tibble(id = c(1, 1, 2, 2), t = as....

Anand

103

asked Oct 4 at 1:38

1 vote

1 answer

93 views

Converting a Rust `futures::TryStream` to a `polars::LazyFrame`

I have an application where I have a futures::TryStream. Still in a streaming fashion, I want to convert this into a polars::LazyFrame. It is important to note that the TryStream comes from the ...

bmitc

908

asked Sep 30 at 4:00

0 votes

0 answers

100 views

PyCharm "view as DataFrame" shows nothing for polars DataFrames

Basically the title. Using PyCharm 2023.3.3 I'm not able to see the data of polars DataFrames. As an example, I've a simple DataFrame like this: print(ids_df) shape: (1, 4) ┌───────────────────────────...

Nauel

522

asked Sep 29 at 9:56

0 votes

1 answer

105 views

Show progress bar when reading files with globbing with polars

I have a folder with multiple Excel files. I'm reading all of them in a single polars DataFrame concatenated vertically using globbing: import polars as pl df = pl.read_excel("folder/*.xlsx")...

robertspierre

5,364

asked Sep 23 at 3:18

3 votes

3 answers

128 views

Order of columns in a plotnine bar plot using a polars dataframe

I'm quite new to the packages polars and plotnine and have the following code: import polars as pl import polars.selectors as cs from plotnine import * df = pl.read_csv('https://raw.githubusercontent....

René

4,919

asked Aug 4 at 19:59

2 votes

1 answer

111 views

Horizontal cumulative sum + unnest bug in polars

When I use horizontal cumulative sum followed by unnest, a "literal" column is formed that stays in the schema even when dropped. Here is an example: import polars as pl def ...

Nicolò Cavalleri

223

asked Aug 4 at 18:49

1 vote

1 answer

247 views

What is the most efficient way to check if a Polars LazyFrame has duplicates?

With the help of claude sonnet 4, I cooked up this function, which I hope does what I asked it to do. def has_duplicates_early_exit(df: pl.LazyFrame, subset: list[str]) -> bool: ""&...

Nicolò Cavalleri

223

asked Aug 4 at 12:45

4 votes

1 answer

573 views

How to use the "is_in" function correctly?

In Polars 0.46.0 it works normally: let df = df!( "id" => [0, 1, 2, 3, 4], "col_1" => [1, 2, 3, 4, 5], "col_2" => [3, 4, 5, 6, 7], ) .unwrap(); dbg!(&...

Alex Avin

43

asked Aug 4 at 10:49

1 vote

1 answer

178 views

Why does polars kept killing the python kernel when joining two lazy frames and collecting them?

I have one dataframe: bos_df_3 that has about a 30k+ rows and another, taxon_ranked_only, with 6 million when I tried to join them using: matching_df = ( pl.LazyFrame(bos_df_3) .join( other=...

Ryan

426

asked Jul 24 at 2:32

4 votes

1 answer

185 views

How to use Polars copy-on-write principle?

I come from C++ and R world and just started using Polars. This is a great library. I want to confirm my understanding of its copy-on-write principle: import polars as pl x = pl.DataFrame({'a': [1, 2, ...

user2961927

1,790

asked Jul 20 at 18:47

0 votes

2 answers

464 views

When reading a database table with polars, how do I avoid a SchemaError?

I have a large table_to_load in a database file my_database.db that I am trying to read into a Python program as a polars DataFrame. Here is the code that does the reading: import polars as pl conn =...

SapereAude

427

asked Jul 10 at 15:10

1 vote

1 answer

213 views

Polars reading just one file from s3 with glob patterns

I have a s3 location in which I have a list of directories and each directory contains a csv named sample_file.csv. I am trying to read these files using a glob pattern in pl.read_csv but it is just ...

figs_and_nuts

5,881

asked Jun 30 at 15:23

2 votes

1 answer

68 views

Polars `concat_arr` no longer takes in `pl.col` as parameter?

The following used to work: pl.concat_arr(pl.col("X[m]", "Y[m]", "Z[m]")).alias("Antenna_position[m]"), but now (polars 1.31.0) I get an error: Traceback (most ...

Jason Yao

47

asked Jun 27 at 22:18

0 votes

0 answers

120 views

Trying and failing to encrypt CSV with polars_encryption

When I try to run the below function: import polars as pl from polars_encryption import encrypt, decrypt def crypt(csv_file: str, delim: str, password: str, output_file: str): """ ...

James McIntyre

116

asked Jun 17 at 14:47

1 vote

2 answers

167 views

How to specify relevant columns with read_excel

As far as I can tell, the following MRE conforms to the relevant documentation: import polars df = polars.read_excel( "/Volumes/Spare/foo.xlsx", engine="calamine", ...

jackal

29.1k

asked Jun 11 at 9:18

2 votes

0 answers

101 views

Best way to trigger lazy evaluation in PySpark and Polars for benchmarking

I'm currently benchmarking PySpark vs the growing alternative Polars. Basically I'm writing various queries (aggregations, filtering, sorting etc.) and measure the execution time, RAM and CPU. I ...

Ernest P W

73

asked Jun 5 at 22:21

1 vote

0 answers

113 views

Polars maximum length reached when rolling with list aggregation

I'm working with time series data of daily historic demands for a lot of stores/products with a dataframe size of around 27.000.000 rows and 82.000 individual time series (specified by id_store and ...

Daeron

121

asked Jun 4 at 16:00

1 vote

1 answer

83 views

How to join/map a polars dataframe to a dict? [duplicate]

I have a polars dataframe, and a dictionary. I want to map a column in the dataframe to the keys of the dictionary, and then add the corresponding values as a new column. import polars as pl my_dict =...

falsePockets

4,423

asked May 26 at 10:22

1 vote

0 answers

81 views

How to read delta table and get empty columns in df?

In my file I have : { "Car": { "Model": null, "Color": null, } } I use read_delta to read the file: df = df.read_delta(path) At the end, I have an empty df. ...

ninja_minida

53

asked May 23 at 18:27

0 votes

1 answer

115 views

Is there a polars operation to apply a function over each pair of groups?

I have a polars data frame which could be generated like so: import polars as pl import numpy as np num_points = 10 group_count = 3 df = pl.DataFrame( { "group_id": np....

TOgy

3

asked May 19 at 17:10

1 vote

1 answer

117 views

Want to broadcast a NumPy array using `pl.lit()` in Polars

Goal I have a NumPy array true_direction = np.array([1,2,3]).reshape(1,3) which I want to insert into a Polars DataFrame; that is, repeat this array in every row of the DataFrame. What I have tried ...

Jason Yao

47

asked May 18 at 21:41

6 votes

1 answer

132 views

Split a column of string into list of list

How could I split a column of string into list of list? Minimum example: import polars as pl pl.Config(fmt_table_cell_list_len=6, fmt_str_lengths=100) df = pl.DataFrame({'test': "A,B,C,1\nD,E,F,...

Baffin Chu

217

asked May 17 at 9:00

0 votes

1 answer

198 views

polars read_csv_batched not working with GCS

I am trying to load a 1GB+ csv file from GCS and encountering memory issues. So I am trying to use read_csv_batched per Memory issues sorting larger than memory file with polars The documentation for ...

sicsmpr

55

asked May 15 at 21:56

1 vote

1 answer

203 views

How do I ensure that a Polars expression plugin properly uses multiple CPUs?

I'm writing a polars plugin that works, but never seems to use more than one CPU. The plugin's function is element-wise, and is marked as such in register_plugin_function. What might I need to do to ...

sclamons

103

asked May 15 at 17:40

2 votes

1 answer

173 views

Split an array in a polars dataframe into regular columns

I have a pl.DataFrame with an array column. I need to split the array columns into traditional columns while assigning headers at the same time. import polars as pl df = pl.DataFrame( { &...

Andi

5,167

asked May 15 at 6:20

0 votes

1 answer

180 views

Polars for Python, can I read parquet files with hive_partitioning when the directory structure and files have been manually written?

I manually created directory structures and wrote parquet files rather than used the partition_by parameter in the write_parquet() function of the python polars library because I want full control ...

Matt

7,316

asked May 8 at 2:34

1 vote

0 answers

112 views

In a polars dataframe i want to replace ints with a list of ints

I have a map which is a dict that takes a an int and maps into to a list of ints. I have a polars dataframe column where i would like each int to be replaced by the relevant vector from the map. ...

Jon Tofteskov

11

asked May 5 at 13:53

3 votes

1 answer

108 views

Modify list of arrays in place

I have a df like: # /// script # requires-python = ">=3.13" # dependencies = [ # "polars", # ] # /// import polars as pl df = pl.DataFrame( { "points"...

DJDuque

954

asked Apr 28 at 3:55

0 votes

0 answers

161 views

How can I use Polars to read a Parquet file in small batches? BatchedParquetReader seems broken

I want to read a Parquet file in batches/chunks so I don’t have to have the whole file in RAM. It’s a large file like tens of Gigabytes. I tried BatchedParquetReader, but it still reads the entire ...

Chris

3

asked Apr 18 at 19:27

1 vote

1 answer

134 views

Get a grouped sum in polars, but keep all individual rows

I am breaking my head over this probably pretty simply question and I just can't find the answer anywhere. I want to create a new column with a grouped sum of another column, but I want to keep all ...

gernophil

627

asked Apr 16 at 9:29

1 vote

1 answer

55 views

Can you create multiple columns based on the same set of conditions in Polars?

Is it possible to do something like this in Polars? Like do you need a separate when.then.otherwise for each of the 4 new varialbles, or can you use struct to create multiple new variables from one ...

catquas

794

asked Apr 15 at 21:50

2 votes

1 answer

124 views

how to unnest struct columns without dropping empty structs with r-polars

I have a DataFrame that I need to separate columns when there are commas. The problem is when I have columns that are all null. In the example below, I need a DataFrame with the columns "mpg"...

user27247029

65

asked Apr 14 at 20:26

2 votes

2 answers

275 views

Sort a polars dataframe based on an external list

Morning, I'm not sure if this can be achieved.. Let's say i have a polars dataframe with cols a, b (whatever). import polars as pl df = pl.from_repr(""" ┌─────┬─────┐ │ a ┆ b │ │ --...

Ghost

1,594

asked Apr 10 at 13:19

1 vote

0 answers

130 views

Howto efficiently apply a gufunc to a 2D region of a Polars DataFrame

Both Polars and Numba are fantastic libraries that complement each other pretty well. There are some limitations when using Numba-compiled functions in Polars: Arrow columns must be converted to ...

Olibarer

423

asked Apr 7 at 12:57

1 vote

2 answers

145 views

Rust-polars: unable to filter dataframe after renaming the column filtered

The following code runs: fn main() { let mut df = df! [ "names" => ["a", "b", "c", "d"], "values" => [1, 2, 3, 4], ...

Roger V.

803

asked Apr 4 at 9:37

1 vote

1 answer

208 views

How to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no ...

pinpss

173

asked Apr 3 at 23:13

1 vote

1 answer

150 views

Using `is_in` in rust-polars

I am trying to subset a rust-polars dataframe by the names contained in a different frame: use polars::prelude::*; fn main() { let mut df = df! [ "names" => ["a", &...

Roger V.

803

asked Apr 3 at 11:40

0 votes

0 answers

185 views

Polars out of core sorting and memory usage

From what I understand this is a main use case for Polars: being able to process a dataset that is larger than RAM, using disk space if necessary. Yet I am unable to achieve this in a Kubernetes ...

Nicolas Galler

1,319

asked Apr 3 at 9:11

1 vote

1 answer

83 views

How to get all group in polars by rust?

In python, just like this df = pl.DataFrame({"foo": ["a", "a", "b"], "bar": [1, 2, 3]}) for name, data in df.group_by("foo"): print(...

Nyssance

401

asked Mar 30 at 8:47

0 votes

1 answer

111 views

Polars wheel file

The provided whl files for polars library are tagged as abi3. I am working with specific setup that needs ABI tag to be cp39. I tried unpacking and packing again while changing the tag but still not ...

RGI

21

asked Mar 28 at 12:58

0 votes

1 answer

141 views

polars date quarter parsing using strptime returns null

Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value: import polars as pl date_format = "%Y-Q%q-%d&...

sicsmpr

55

asked Mar 25 at 15:10

2 votes

1 answer

164 views

Polars Dataframe from nested dictionaries as columns

I have a dictionary of nested columns with the index as key in each one. When i try to convert it to a polars dataframe, it fetches the column names and the values right, but each column has just one ...

Ghost

1,594

asked Mar 24 at 17:51

0 votes

0 answers

122 views

Polars method in GCP

Sorry if this is a 'too general/wide' question. I'm having some issues when trying to execute any polars DataFrame.filter() line in Google Cloud python Functions. The code doesn't crash, it just ...

Ghost

1,594

asked Mar 19 at 0:38

-5 votes

1 answer

1k views

Filtering polars dataframe by row with boolean mask [closed]

I'm trying to filter a Polars dataframe by using a boolean mask for the rows, which is generated from conditions on an specific column using: df = df[df['col'] == cond] And it's giving me an error ...

Ghost

1,594

asked Mar 17 at 16:26

2 votes

0 answers

183 views

Why does a Parquet file written with Polars query faster than one written with Spark?

I am writing Parquet files using two different frameworks—Apache Spark (Scala) and Polars (Python)—with the same schema and data. However, when I query the resulting Parquet files using Apache ...

user29976558

21

asked Mar 14 at 22:00

0 votes

0 answers

205 views

Values differ on multiple reads from parquet files using polars read_parquet but not with pandas read_parquet by workstation

I read data from the same parquet files multiple times using polars (polars rust engine and pyarrow) and using pandas pyarrow backend (not fastparquet as it was very slow), see below code. All the ...

newandlost

1,080

asked Mar 13 at 13:12

0 votes

1 answer

81 views

Efficient (and Safe) Way of Accessing Larger-than-Memory Datasets in Parallel

I am trying to use polars~=1.24.0 on Python 3.13 to process larger-than-memory sized datasets. Specifically, I am loading many (i.e., 35 of them) parquet files via the polars.scan_parquet('base-name-*....

Arda Aytekin

1,303

asked Mar 13 at 12:48

2 votes

2 answers

149 views

Create a uniform dataset in Polars with cross joins

I am working with Polars and need to ensure that my dataset contains all possible combinations of unique values in certain index columns. If a combination is missing in the original data, it should be ...

Olibarer

423

asked Mar 12 at 16:46

0 votes

1 answer

181 views

How to serialize json from a polars df in rust

I want get json from polars dataframe, follow this answer Serialize Polars `dataframe` to `serde_json::Value` use polars::prelude::*; fn main() { let df = df! { "a" => [1,2,3,...

Nyssance

401

asked Mar 12 at 14:31

Collectives™ on Stack Overflow