I wrote a C# console application to read, and find the most common word in a text file. My application works for small text files but not large text files. A friend told me about algorithm complexity analysis, something I did know about in the past that I'm still learning about. Now I'm trying to figure out what's wrong with my code, why does it only work for small text files. Here's my code pasted below:
static void Main(string[] args)
{
String line = "";
String word = "";
int count = 0;
int maxCount = 0;
string[] string1 = new string[0];
ArrayList words = new ArrayList();
var path = ConfigurationSettings.AppSettings["inputFilePath"];
if(path == null)
{
throw new Exception();
}
var watch = System.Diagnostics.Stopwatch.StartNew();
try
{
//opens text file read mode
using(StreamReader file = new StreamReader(path))
{
while((line = file.ReadLine()) != null)
{
//separates and adds each word into an array
string1 = line.ToLower().Split(new char[] { ',', '.', ' ' }, StringSplitOptions.RemoveEmptyEntries);
//adds the words from the array into a list
foreach(String s in string1)
{
words.Add(s);
}
}
//looks for the most common word in the list
for(int i = 0; i < words.Count; i++)
{
count = 1;
//counts each word and stores value to count variable
for(int j = i + 1; j < words.Count; j++)
{
if(words[i].Equals(words[j]))
{
count++;
}
}
//if count > maxCount, count value stored in maxCount, and corresponding word to word variable
if(count > maxCount)
{
maxCount = count;
word = (String)words[i];
}
}
Console.WriteLine("The most common word is " + word + ", with " + maxCount + " occurrences.");
file.Close();
}
}
catch(Exception e)
{
Console.WriteLine("There was an error opening the file: " + e.Message);
}
watch.Stop();
var elapseMs = watch.ElapsedMilliseconds;
Console.WriteLine("File processed in " + elapseMs + " milliseconds");
}
Dictionary<string, int>to store the words.why does it only work for small text filesmean? I can definitely see inefficient collection usage, variables with too-wide a scope (what's string1 doing up there?) but is that the real problem?string1is only used inside the first loop, and yet it's decalred at the top, which gives it method-wide scope. Its initial value is never used either. This could be a simplevar strings=line.Split(...);