5

I have the following algorithm ,

private void writetodb()
{
    using(var reader = File.OpenRead("C:\Data.csv");
    using(var parser = new TextFieldParser(reader))
    { 
        //Do some opeartions
        while(!parser.EndOfData)
        {
            //Do operations
            //Take 500 rows of data and put it in dataset
            Thread thread = new thread(() => WriteTodb(tablename, set));
            thread.Start();
            Thread.Sleep(5000);
        }
    }
}

public void WriteTodb(string table, CellSet set)
{
    //WriteToDB
    //Edit: This statement will write to hbase db in hdinsight
    hbase.StoreCells(TableName, set);
}

This method works absolutely fine until 500 mb of data but after that it fails saying Out of memory exception.

I am pretty much sure that it is because of threads but using threads is mandatory and I cant change the architecture.
Can anybody tell me what modifications I have to make in thread programming in the above program to avoid memory exception.

7
  • 6
    A thread occupies approximately 1MB of memory. Given the amount of data (500MB!!!) your program probably runs out of memory. Consider using the Task Parallel Library instead of manually creating threads. Commented Jul 27, 2015 at 16:43
  • Also, check if the dataset write operation is thread safe. Commented Jul 27, 2015 at 16:43
  • 4
    Are you sure that using threads is mandatory and that you can't use tasks instead? Also, are you sure the threads aren't contesting for the resource you're consuming (the file, I presume)? Commented Jul 27, 2015 at 16:44
  • What is the code of function "WriteTodb" ? How do manage your database connections? Perhaps because you use lots of database connections, Commented Jul 27, 2015 at 16:44
  • 4
    It is very inefficient to use threads for I/O; you should be using async/await. Commented Jul 27, 2015 at 17:07

2 Answers 2

6

First of all, I can't understand your words about threading:

I have to make in thread programming in the above program to avoid memory exception.

You will use the thread programming if you use the TPL, as it been already suggested. You really don't have to use the Thread class if you can't understand it. You say that your code is C# 4.0 so the TPL is an option for you. You can do you work something like this (very easy way):

List<Task> tasks  = new List<Task>();
while(!parser.EndOfData)
{
    tasks.Add(Task.Run(() => WriteTodb(tablename, set)));
}
Task.WaitAll(tasks.ToArray());

TPL engine will use the default TaskScheduler class, which uses internal ThreadPool and can level the resources you have on your server.

Also, I see that you're using the HBase client from Microsoft, and it has async method in it:

public async Task StoreCellsAsync(string table, CellSet cells)
{
}

So you can use the asynchronious approach in your code and TPL at the same time:

List<Task> tasks  = new List<Task>();
while(!parser.EndOfData)
{
    tasks.Add(WriteTodb(tablename, set)));
}
// asynchroniously await all the writes
await Task.WhenAll(tasks.ToArray());

public async Task WriteTodb(string table,CellSet set)
{
    //WriteToDB
    //Edit: This statement will write to hbase db in hdinsight asynchroniously!
    await hbase.StoreCellsAsync(TableName, set);
}

If, for some strange reasons, you can't use TPL, you have to refactor your code and write your own thread scheduler:

  1. You don't have to create the thread for your write each time, you can reuse them.
  2. Running second time inside the same thread is, in general, faster than create two different threads for each operation.
  3. Split file into some parts, create thread for the writing, and write the data in a loop.
Sign up to request clarification or add additional context in comments.

Comments

0

Instead of creating new Thread everytime use ThreadPool.QueueUserWorkItem. For refrence see this: https://msdn.microsoft.com/en-us/library/kbf0f1ct(v=vs.110).aspx

1 Comment

That's good advice in general, but it doesn't really address the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.