Sorting .csv from command line

Question

I'm trying to sort this line (2010 census block population densities for the whole US) by the last column.

12001,2,1009,Alachua FL,29.65612,-82.327274,0.0005131,0.013289229,12,902.9869232

censusBlockDensities.csv (moved here from comment)

17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427

Great! Was there anything you tried? What shell are you using? Is it a UNIX or a Windows command-line? — Ruslan
– Ruslan, Commented Jun 22, 2015 at 2:24

Community · Accepted Answer · 2017-05-23 11:58:08Z

5

I am assuming a unix shell (i.e., bash).

Read the manual page for the sort command: man sort

From the man page:

The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

export LC_ALL=C

sort -t , -k 10,10 -n censusBlockDensities.csv

Explanation of the flags:

-t ,: Specifies commas as field delimiters.

-k 10,10: Specifies sorting on the 10th field only (start,stop) (the first field is 1, not 0).

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace. OPTS is one or more single-letter ordering options [bdfgiMhnRrV], which override global ordering options for that key. If no key is given, use the entire line as the key.

-n: Perform a numeric sort instead of the default alphanumeric sort (alternatively, add the 'n' to the -k parameter as mention in a comment below).

censusBlockDensities.csv

17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427

output:

17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336

Edit: A helpful comment indicated an error in my answer. You also need the -n flag to perform a numeric sort (default is alphanumeric). I've amended my answer to include that. You can verify that it is working properly by also trying out the -r flag to sort in reverse order. I also added the stop field index to the -k 10 parameter, as mentioned in another post.

Additionally, you should check your input file to make sure you have the same number of fields in each line:

awk '{print gsub(/,/,"")}' censusBlockDensities.csv

9
9
10 <-- the third record has an additional field
9
9

edited May 23, 2017 at 11:58

CommunityBot

11 silver badge

answered Jun 22, 2015 at 2:31

Autumn Skye

7,56914 gold badges74 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3502552 Over a year ago

Tried that and it didn't work. Note that this is a .csv that I manipulated in Excel then exported as a .csv again export LC_ALL=C sort -t , -k 10 censusBlockDensities.csv > 2.csv output:

17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336 17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686 17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664 17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562 17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427

sjnarv Over a year ago

sort -n -t , -k 10 or sort -t , -k 10n - the 'n' for numeric sort

Autumn Skye Over a year ago

@user3502552 The first thing I noticed about your new sample data was that you have an additional field in the third row. See my updated answer for a way to detect those.

Collectives™ on Stack Overflow

Sorting .csv from command line

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related