Adding to Michaels answer above which only works for some cases (and it wont work perfectly even with a sequence of claps). To make it work in more cases you would need to store more information about the clap noise- as Michael points out in his comment above "properly identify" the clap- to make the identification more accurate. There are various ways to do this- you could compare it with live recording as you wrote in your question. I suggest tyou try neural networks to do such a comparision I. I would say you would need a two layer neural network with say a sample of different claps for training it-(its probably quicker to train it not with arduino). Here is a library for neural networks for arduino
http://robotics.hobbizine.com/arduinoann.html
that you may use. Or you can implement you own neural network. Its well known that sound recognition can be used to analyse sounds.
The built in ADC of Arduino can probably be used to record and good enough for something like a clap in your case-though it cant be used for high quality sound recordings.