0

I am reading a file of floating point numbers and then sorting them. When I use my following sort and swap functions with 1 million numbers, I am successfully able to sort the numbers. However, when I try to sort 100 million numbers, I get a segmentation fault. I am not sure why because I am dynamically allocating memory. How would I be able to handle more than 1 million numbers?

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

void swap(float *a, float *b, size_t n) {
    size_t numbytes; 
    size_t sz = sizeof(float); 
    void *temp = NULL; 
    
    numbytes = n * sz; 
    if (numbytes == 0){
        exit(EXIT_FAILURE); 
    }
    temp = malloc(numbytes);
    
    memcpy(temp, a, numbytes);
    memcpy(a,b,numbytes); 
    memcpy(b,temp,numbytes);
    
    free(temp); 
}

void radixSort(float array[], size_t count) {
    int numOfZero = 0; 
    float a[count];
    float *b = a;
    
    for (uint32_t radix=1; radix; radix<<=1) { //unsigned int 32 bit
        uint32_t *arrayToInt = (uint32_t *)array;
        int zeroCount=0;
        int oneCount=0;
        numOfZero=0;
        
        for (int j=0; j < count; ++j)
        numOfZero += !(arrayToInt[j]&radix);
        oneCount=numOfZero;
        
        for (int j=0; j < count; ++j)
        if (arrayToInt[j]&radix){
            b[oneCount]=array[j];
            ++oneCount;
        }
        else{
            b[zeroCount]=array[j];
            ++zeroCount;
        }
        swap(b,array,count);
    }
    if (numOfZero < count){
        memcpy(b+(count-numOfZero), array, numOfZero*sizeof(float));
        
        for (int d=0,j=count-1;j>=numOfZero;j--,d++)
        b[d]=array[j];
        memcpy(array, b, count*sizeof(float));
    }
}
int main(int argc, char *argv[]) {
    int fd; 
    float num; 
    size_t nr; 
    int eleNum = 0; 
    
    fd = open(argv[1], O_RDONLY); 
    
    if (fd == -1){
        perror("Error opening file");
        exit(EXIT_FAILURE);
    }
    
    struct stat st; 
    fstat(fd, &st); 
    off_t size = st.st_size; 
    for (int j = 0; j < size/4; j++){
        eleNum++; 
    } 
    
    float array[eleNum];
    for (int i = 0; i < eleNum; i++){
        nr = read(fd, &num, sizeof(float));
        if (nr == -1){
            perror("Error reading file"); 
            exit(EXIT_FAILURE); 
        }
        array[i] = num; 
    }
    
    radixSort(array, eleNum);
    
    close(fd); 

    return 0; 
}
8
  • 2
    A good place to start with be to check the return value of malloc. Commented Dec 7, 2020 at 7:20
  • Why the error check in swap? If it's zero bytes, just do nothing. You can safely pass a size of zero to memcpy. Commented Dec 7, 2020 at 7:27
  • 3
    Wait. A hundred million numbers? Do you understand how a VLA (variable-length array) typically works ? This code is a recipe for a stack breach. Commented Dec 7, 2020 at 7:28
  • Not related, but for (int j = 0; j < size/4; j++) eleNum++; is a very roundabout way to write eleNum = size/4;. Plus, you never check for integer overflow if size/4 is larger than INT_MAX. Commented Dec 7, 2020 at 7:31
  • @RetiredNinja I created a file with 1.2 as data. I used read to read data into float variable. When I printed it to the console, it prints 0.0000, not the number. Commented Dec 7, 2020 at 7:54

1 Answer 1

6

These lines:

float a[count]; // In radixSort

float array[eleNum]; // In main

will never work for such big numbers. VLA:s are (typically and always in practice) allocated on the stack. On Windows systems, the stack is typically 1MB and 8MB on Linux. I have written an answer about VLA:s that would be good for you to read. In short, I would advice not using them. Do I really need malloc?

I'm not sure if changing to malloc will solve your problem, but you will not solve it without doing it.

Furthermore, you should check the return value from malloc to see if the allocation worked. Then, if your problems persist, I'd recommend compiling with -Wall -Wextra -pedantic -std=c11 -fsanitize=address -g. Use gdb or another debugger to find which line causes a segfault and investigate values. Use valgrind to detect memory leaks.

And this:

for (int j = 0; j < size/4; j++){
    eleNum++; 
} 

is just weird. It's equivalent to eleNum = size/4.

This, in swap:

if (numbytes == 0){
    exit(EXIT_FAILURE); 
}

is completely unnecessary. It's safe to pass 0 for the size argument to memcpy. It would just lead to that nothing happens. I could understand this for debugging purposes, but in that case you should print something useful, or even better, use assert(numbytes > 0)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.