I do not claim this method for my own, but I found this neat trick on the webpage MUX-DEMUX: CD4051 Parlor Tricks
Whatever method you choose to use to drive outputs or read inputs (shift registers, multiplexors or the straightforward direct use of the Arduino pins themselves) you can DOUBLE the number of outputs or inputs by a clever use of parallel circuit pairs (to form a dual input or output bank), employing diodes in opposing senses on each parallel branch, and switching the inputs/outputs to high and low.
To illustrate the method for outputs (LEDs in this case, note that the extra diodes are not required):

If you consider the pair of LEDs in this example to be a "bank", and
you want to light LED_0, you need to set PIN 17 to HIGH, and PIN 18 to
LOW. (The pin numbers are confusing, but they match the later example
so bare with me). To light LED_1, you just reverse the PINS. The diode
nature of LEDs keeps the current from flowing the opposite direction
keeping the other one off.
To illustrate the method for inputs (CdSs in this case, note that the extra diodes are required):

This gets a little more complicated if you want to do an analog read
on a CdS light sensor. First, you need to add a diode to each sensor
to control the flow. Second, since you are reading values, you need to
pull the inputs high or low to keep them from floating. Being a lazy
person, I'm going to pull them high using the internal pull-up
resistors. To read CdS_0, you set PIN 17 mode to OUTPUT and set it to
LOW. This makes it the ground. Then you set PIN 18 mode to INPUT and
set it to HIGH to engage the pull-up resistor. Now you are set to do a
read on PIN 18 (a.k.a. analog pin 4). To access the other sensor, just
switch the modes and outputs.
So, if you have a CD4051 8 port multiplexor, using 5 pins on the Arduino (instead of the usual 3), you can obtain 16 inputs or outputs, or a mix of the two.

Likewise, if you have a 4067 16 port multiplexor you can obtain 32 inputs or outputs, or a mix of the two.
An example sketch would be:
/*
* Example of getting 16 i/o from 5 pins using a CD4051
*
* Based on tutorial and code by david c. and tomek n.* for k3 / malmö högskola
* http://www.arduino.cc/playground/Learning/4051?action=sourceblock&ref=1
*/
int selPin[] = { 14, 15, 16 }; // select pins on 4051 (analog A0, A1, A2)
int commonPin[] = { 17, 18}; // common in/out pins (analog A3, A4)
int led[] = {LOW, LOW, LOW, LOW, LOW, LOW, LOW, LOW }; // stores eight LED states
int CdSVal[] = { 0, 0, 0, 0 }; // store last CdS readings
int cnt = 0; // main loop counter
int persistDelay = 100; // LED ontime in microseconds
void setup(){
Serial.begin(9600); // serial comms for troubleshooting (always)
for(int pin = 0; pin < 3; pin++){ // setup select pins
pinMode(selPin[pin], OUTPUT);
}
}
void loop(){
flashLEDs();
if (cnt == 0){
for(int x; x < 8; x++){
led[x] = random(2);
}
}
cnt++;
if (cnt > 100) { cnt = 0; }
}
void flashLEDs() {
for(int pin = 0; pin < 2; pin++) { // set common pins low
pinMode(commonPin[pin], OUTPUT);
digitalWrite(commonPin[pin], LOW);
}
for (int bank = 0; bank < 4; bank++) {
for(int pin = 0; pin < 3; pin++) { // parse out select pin bits
int signal = (bank >> pin) & 1; // shift & bitwise compare
digitalWrite(selPin[pin], signal);
}
if (led[bank * 2]){ // first LED
digitalWrite(commonPin[0], HIGH); // turn common on
delayMicroseconds(persistDelay); // leave led lit
digitalWrite(commonPin[0], LOW); // turn common off
}
if (led[bank * 2 + 1]){ // repeat for second LED
digitalWrite(commonPin[1], HIGH);
delayMicroseconds(persistDelay);
digitalWrite(commonPin[1], LOW);
}
}
}
Like I said in the first line, the full explanation can be found on MUX-DEMUX: CD4051 Parlor Tricks