shiftOut() relies on digitalWrite(), which is dead slow by AVR
standards. You could probably get a factor 5, or even 10, by
reimplementing shiftOut() using direct port
access instead.
Using SPI (as suggested in the comments) would be even faster, but you
would need to clock out of pin 13.
Edit: here is an attempt at such implementation. Note that it is completely untested. Note also the small delay meant to avoid the thing being too fast. You may try to play with the delay value.
#include <util/delay.h>
/*
* Especialized and faster shiftOut().
* This assumes no one is touching PORTB at the same time.
*
* dataPin = 11 (PB3), clockPin = 12 (PB4), bitOrder = MSBFIRST.
*/
void myShiftOut(uint8_t val)
{
for (uint8_t i = 0; i < 8; i++) {
// Send data bit.
if (val & 0x80)
PORTB |= _BV(PB3);
else
PORTB &= ~_BV(PB3);
val <<= 1;
// Toggle clock twice.
PORTB |= _BV(PB4);
_delay_us(0.5);
PORTB &= ~_BV(PB4);
}
}