2

I am having a performance issue with SQLite database (.db)

I am trying to update 1,00,000 records in database (.db) which taking around 50 minutes. Too much slow.

my code is like below ::

        for (int q = 0; q < list.Count; q++) 
            { 
        ArrayList castarraylist = new ArrayList(); 
        castarraylist = (ArrayList)(list[q]); 

        using (var cmd = new SQLiteCommand(con)) 

            using (var transaction = con.BeginTransaction()) 
            { 
                cmd.Transaction = transaction; 

                for (int y = 0; y < castarraylist.Count; y++) 
                { 
                        cmd.CommandText = Convert.ToString(castarraylist[y]); 
                           cmd.ExecuteNonQuery(); 
                } 
                transaction.Commit(); 
                GC.Collect(); 
            } 
        } 

Here each castarraylist contains 5000 records. which updating into database with transaction. so loop go through 20 times and complete the update all. While I manually check the time it's increasing the time at each iteration for 5000 records. like

1st 5000 records processing time > 1:11 minute

2nd 5000 records processing time > 1:25 minute

3rd  5000 records processing time > 1:32 minute 

4th 5000 records processing time > 1:40 minute 

5th 5000 records processing time > 1:47 minute 

6th 5000 records processing time > 1:52 minute 

...

... 

... 

17th 5000 records processing time > 3:32 minute 

18th 5000 records processing time > 3:44 minute

19th 5000 records processing time > 4:02 minute 

20th 5000 records processing time> 4:56 minute 

Why this happening I don't able to understand. My sourcecode written in C# and my laptop configuration is i5 2.6 GHz, 4 GB RAM, 500 GB HD.

I made connection like below ::

SQLiteConnection con = new SQLiteConnection("Data Source=" + fullPath + ";Version=3;Count Changes=off;Journal Mode=off;Pooling=true;Cache Size=10000;Page Size=4096;Synchronous=off"); 

(*fullpath - is my database path)

I am creating table like below...

sqlquery2="Select LINK_ID from RDF_LINK string createLinkToPoly = "create table temp2 AS " + sqlquery2;

This would creating a table and inserting records which are get through by sqlquery2.

Below statement extends Spatialite on SQLite

ExecuteStatement("select load_extension('spatialite.dll')", con);

My Update statement is like below ::

UPDATE temp2 SET GEOM = Transform(LineStringFromText('LINESTRING(4.38368 51.18109,4.38427 51.18165)',4326),32632)WHERE LINK_ID= 53841546

so This kind of 100000 statement building in different threads and inserting into LIST

at last executing UPDATE statements in above code (now using code of Larry suggested)

11
  • 2
    What are the actual SQL commands that get executed? Commented Jul 4, 2014 at 7:01
  • @CL. actual commands are in castarraylist Commented Jul 4, 2014 at 10:17
  • @Larry .. I supposed memory is the issue here so I used GC.Collect() Commented Jul 4, 2014 at 10:18
  • I did not ask where they are stored but what they are. The increasing times show that you have some non-indexed lookup in there. Show some SQL example. Commented Jul 4, 2014 at 10:56
  • Is it better now, without GC.Collect, with the Transaction that encloses the whole processing ? Commented Jul 4, 2014 at 11:39

4 Answers 4

3

Currencly, the transaction is run per query, which makes no sense.

Enclose your main loop code in the transaction, and remove this GC.Collect().

EDIT:

As I understood, you dont want the global update to be rolled back in case of an error. So I changed the code a bit.

Additionally, I am not sure that the command object can be reused by changing the CommandText and running queries again. That's why I suggest to create it every time.

using (var transaction = con.BeginTransaction()) 
{ 
    for (int q = 0; q < list.Count; q++) 
    { 
        var castarraylist = (ArrayList)(list[q]); 

        for (int y = 0; y < castarraylist.Count; y++) 
        { 
            using (var cmd = new SQLiteCommand(con)) 
            {
                cmd.Transaction = transaction; 
                cmd.CommandText = Convert.ToString(castarraylist[y]);
                try
                {
                    cmd.ExecuteNonQuery();
                }
                catch(Exception ex)
                {
                    // Log the update problem
                    Console.WriteLine("Update problem " + cmd.CommandText + " - Reason: " + ex.Message);
                }
            }
        }
    }

    transaction.Commit();
}
Sign up to request clarification or add additional context in comments.

9 Comments

@Larry... I used your code and I also take all value in one list. so my code is now ... using (var transaction = con.BeginTransaction()){try{using (var cmd = new SQLiteCommand(con)) {cmd.Transaction = transaction; for (int q = 0; q < latlongquery1output.Count; q++) {cmd.CommandText = latlongquery1output[q]; cmd.ExecuteNonQuery();}} transaction.Commit();} catch{transaction.Rollback();throw;}}
@Larry... currently this list contains 1 million records... and with this code is single record is fail total transaction is being failed so is there any simpler way ?
@Hardik Oh, I see. I adapted the code accordingly so it will not rollback the whole changes if a record refuses to update for whatever reasons. I also changed the way the SQLLiteCommand object is used so it is not re-used anymore. Let me know if it is better or not.
@Larry... yes your code and creation of Index help me so much and it's improved my performance too (now the task complete in only 5 minutes) ... Can we able to change the code so it can not rollback the whole changes if a record refuses to update for whatever reasons ?
Glad it helps ! :) Actually, the code in the answer is designed to NOT rollback changes to the database if a single record refuses to update. Instead, you will have to implement something in the catch section to log if an update fails (in a file or anything else), and it will continue until everything is done. Then all updates are eventually committed. Have you tried to read and insert records in a new table instead of updating an existing one ? It might be even faster.
|
3

First, you should try using prepared statement for better performance. Take a look at System.Data.SQLite documentation, so you can use SQLiteParameter and set the parameter value in the loop.

Second, ArrayList should be slower than List or Array. Maybe changing that can help.

Third, there may be some Pragma commands you can use.

Edit: I see you already turned off synchronous and journal_mode, I'm not sure there is any other pragma you should use. In some cases, locking_mode = EXCLUSIVE and temp_store = MEMORY can be helpful.

2 Comments

@Griddor... I am not passing any parameter values. I am preapring statement directly in another thread... I tried to use LIST but it's not improved much performance.. yes mostly known all Pragma commands I used while making connection to database
@Hardik Actually prepared statement is a vital part of using SQLite, only change parameter value in the loop. Executing the query directly means preparing the statement for each individual query. Besides, like Larry's answer, committing the transaction only once after all the loops should be faster than multiple commits.
2

You're probably not having a performance issue with SQLite; you're almost certainly having a performance issue with your own code:

  • Calling GC.Collect() at all is almost certainly not necessary. What you're doing here shouldn't be causing any significant memory pressure, and if it were I would strongly recommend just letting the garbage collector do it's own thing rather than force the issue. Worse, you're calling GC.Collect() on every single iteration of the loop. Don't do this!

  • Is it really necessary for each individual update to be made in its own transaction? You do realise that if your code fails and throws an exception halfway through this loop, the first half of the updates will have been committed but you won't have any way of picking up where you left off? You won't even have an easy way of knowing where you left off.

  • Is there any particular reason you're using an ArrayList, rather than a List<T>? This is causing you to need to perform a cast and call Convert.ToString in your inner loop, which shouldn't be necessary (unless you have a very, very good reason for using ArrayList).

Comments

2

The UPDATE statements are slow because the database has to scan all records in the table to find any matching LINK_ID values. You need an index on the LINK_ID column.

Either create it manually before doing the updates:

CREATE INDEX temp2_linkid ON temp2(LINK_ID);

Or create the index when you are creating the table (which requires that the table is created explicitly):

CREATE TABLE temp2 ( LINK_ID INTEGER PRIMARY KEY );
INSERT INTO temp2(LINK_ID) SELECT LINK_ID FROM RDF_LINK;

3 Comments

@CL... yes I have created INDEX and it's really very much effective.
but I think I can't be able to create Primary Key because I am inserting more then 1 column sometime and they are not predefined.
currently facing problem ... UPDATE statements I am creating in different Threads and where it's giving outofMemory Exception on string and Array.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.