This is a rather abstract question as I yet have no idea how to solve it and haven't found any suitable solutions.
Let's start with the current situation. You'll have an array of byte[] (e.g. ArrayList<byte[]>) which behind the scene are actually Strings, but at the current state the byte[] is prefered. They can be very long (1024+ bytes for each byte[] array whereas the ArrayList may contain up to 1024 byte[] arrays) and might have a different length. Furthermore, they share a lot of the same bytes at the "same" locations (this is relativ, a = {0x41, 0x41, 0x61}, b = {0x41, 0x41, 0x42, 0x61 } => where the first 0x41 and the last 0x61 are the same).
I'm looking now for an algorithm that compares all those arrays with each other. The result should be the array that differs the most and how much they differ from each other (some kind of metric). Furthermore, the task should complete within a short time.
If possible without using any third party libraries (but i doubt it is feasible in a reasonable time without one).
Any suggestions are very welcome.
Edit:
Made some adjustments.
EDIT / SOLUTION:
I'm using the Levenshtein distance now. Furthermore, I've made some slight adjustments to improve the runtime / speed. This is very specific to the data I'm handling as I know that all Strings have a lot in common (and I know approximatly where). So filtering that content improves the speed by a factor of 400 in comparison to two unfiltered Strings (test data) used directly by the Levenshtein distance algorithm.
Thanks for your input / answers, they were a great assistance.
Listimplementation of your choice. Even if the Levenshtein distance if expensive to calculate, it appears to be the relevant metric here. Comparing all arrays against all arrays will also have runtime characteristics of O(n²), which argueably isnt fast.