3

I'm trying to sort this line (2010 census block population densities for the whole US) by the last column.

12001,2,1009,Alachua FL,29.65612,-82.327274,0.0005131,0.013289229,12,902.9869232

censusBlockDensities.csv (moved here from comment)

17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
1
  • 1
    Great! Was there anything you tried? What shell are you using? Is it a UNIX or a Windows command-line? Commented Jun 22, 2015 at 2:24

1 Answer 1

5

I am assuming a unix shell (i.e., bash).

Read the manual page for the sort command: man sort

From the man page:

The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

export LC_ALL=C

sort -t , -k 10,10 -n censusBlockDensities.csv

Explanation of the flags:

-t ,: Specifies commas as field delimiters.

-k 10,10: Specifies sorting on the 10th field only (start,stop) (the first field is 1, not 0).

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace. OPTS is one or more single-letter ordering options [bdfgiMhnRrV], which override global ordering options for that key. If no key is given, use the entire line as the key.

-n: Perform a numeric sort instead of the default alphanumeric sort (alternatively, add the 'n' to the -k parameter as mention in a comment below).

censusBlockDensities.csv

17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427

output:

17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336

Edit: A helpful comment indicated an error in my answer. You also need the -n flag to perform a numeric sort (default is alphanumeric). I've amended my answer to include that. You can verify that it is working properly by also trying out the -r flag to sort in reverse order. I also added the stop field index to the -k 10 parameter, as mentioned in another post.

Additionally, you should check your input file to make sure you have the same number of fields in each line:

awk '{print gsub(/,/,"")}' censusBlockDensities.csv

9
9
10 <-- the third record has an additional field
9
9
Sign up to request clarification or add additional context in comments.

3 Comments

Tried that and it didn't work. Note that this is a .csv that I manipulated in Excel then exported as a .csv again export LC_ALL=C sort -t , -k 10 censusBlockDensities.csv > 2.csv output: 17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336 17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686 17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664 17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562 17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
sort -n -t , -k 10 or sort -t , -k 10n - the 'n' for numeric sort
@user3502552 The first thing I noticed about your new sample data was that you have an additional field in the third row. See my updated answer for a way to detect those.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.