pyspark read csv file with regular expression

Question

I'm trying to read csv files from a directory with a particular pattern I want to match all the files with that contains this string "logs_455DD_33 t should match anything like "

machine_logs_455DD_33.csv

logs_455DD_33_2018.csv

machine_logs_455DD_33_2018.csv

I've tried the following regex but it doesn't match files with the above format .

file = "hdfs://data/logs/{*}logs_455DD_33{*}.csv"
df = spark.read.csv(file)

Try this file = "hdfs://data/logs/*logs_455DD_33*.csv"

Kaushal
– Kaushal

2018-06-15 11:12:40 +00:00
Commented Jun 15, 2018 at 11:12 — Kaushal
– Kaushal, Commented Jun 15, 2018 at 11:12

Red Boy · Accepted Answer · 2018-06-15 18:28:42Z

1

I had to do a similar thing in my pyspark program where I need to pick a file in HDFS by cycle_date and I did like this:

df=spark.read.parquet(pathtoFile + "*" + cycle_date + "*")

edited Jun 15, 2018 at 18:28

Red Boy

5,7893 gold badges34 silver badges49 bronze badges

answered Jun 15, 2018 at 15:39

Vamshi T

213 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

plalanne · Accepted Answer · 2018-06-15 09:56:36Z

0

You could use a subprocess to liste files in hdfs and grep these files :

import subprocess

# Define path and pattern to match
dir_in = "data/logs"
your_pattern = "logs_455DD_33"

# Specify your subprocess
args = "hdfs dfs -ls "+dir_in+" | awk '{print $8}' | grep "+your_pattern
proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)

# Get output and split it
s_output, s_err = proc.communicate()
l_file = s_output.split('\n')

# Read files
for file in l_file :
    df = spark.read.csv(file)

answered Jun 15, 2018 at 9:56

plalanne

1,0302 gold badges13 silver badges30 bronze badges

1 Comment

Marco Over a year ago

Doesn't this specifically not use spark, which means that it will be slower across a cluster with a distributed file system? (HDFS/EMRFS)

Collectives™ on Stack Overflow

pyspark read csv file with regular expression

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related