0

I'm reading a db that contains some tables using the ExecuteReader() command. Based on the result of the first Read() result I read in two different tables, as I need the ID returned on the first query to run the second one.

The problem is that this search is extremely slow.

tuCommand.CommandText = "SELECT * FROM tblTranslationUnit WHERE DocumentId = " + doc.DocumentId;
var tuReader = tuCommand.ExecuteReader();
while (tuReader.Read())
{
    var tu = new TranslationUnit
     {
         TranslationUnitId = tuReader.GetInt64(0),
         DocumentId = tuReader.GetInt64(1),
         Raw = tuReader.GetString(2),
         IsSegmented = tuReader.GetBoolean(3),
         Reader = this, // Ryan: Fixed so that it sets the reader to itself
     };

    using (var propCommand = _dbConn.CreateCommand())
    {
        propCommand.CommandText = "SELECT * FROM tblTranslationUnitProperties WHERE TranslationUnitId = " + tu.TranslationUnitId;
        var propReader = propCommand.ExecuteReader();
        while (propReader.Read()) tu.Properties.Add(GetProperty(propReader));
    }
    yield return tu;
}

If I remove the second ExecuteReader() the query is really fast

I have also tried to put the second ExecuteReader() using a new connection and a new transaction but the result is almost the same

Any idea or clue? How can I do this kind of search? Are there any better approach? (I suppose yes).


More details The db structure is :

  - Document
      - properties
      - errors
    -TranslationUnits
        - properties
        - errors
      - Segments
          - properties
          - errors

So in some parts of the code we will have this structure

  foreach (document in db)
      foreach (property in document)
      foreach (error in document)
    foreach (translationunit in document)
        foreach (property in translationunit)
        foreach (error in translationunit)
      foreach (segment in translationunit)
          foreach (property in segment)
          foreach (error in segment)

Based on that, the use of a join to return everything is not a good idea I was thinking if the problem was just a SQLite configuration problem. I mean, if it's possible to add any parameter or similar to tell the system that we are going to use several pointers

Now we are moving to a datatable solution:

  1. open a connection
  2. read 1000 entries of a table
  3. close the connection
  4. open a new connection
  5. read 1000 entries of the child table
  6. close the new connection
  7. ...
2
  • I don't know if it will solve the problem, but you should enclose both calls of ExecuteReader() in 'using' statement to make sure that reader is correctly closed. Commented Apr 19, 2012 at 11:15
  • Just to be certain, what sqlite library and version are you using? Commented Apr 19, 2012 at 12:37

5 Answers 5

1

It sounds like you have scalability issues. SQLite has the word "Lite" in it for a reason. It lacks certain things like high concurrency, fine-grained access control, a rich set of built-in functions, stored procedures, esoteric SQL language features, XML and/or Java extensions, tera- or peta-byte scalability, and so forth. I recommend changing databases for starters.

I'm also unclear by your question why you would need a 1000 documents in memory all at once and especially 1000 documents with 1000 parts with 1000 more parts, all in memory. I don't know your UI requirements, but in my 15+ years in programming I don't recall ever having to display 1000 of anything on a single web page or form without some sort of paging mechanism, so do you really need to get 1000 * 1000 * 1000 entities from the database all at once?

I think you need to take another look at the UI, the current model and the data layer to look for ways to deliver as little content as necessary without sacrificing a lot of performance. Consider using patterns like Lazy Loading, read ahead buffers, caching, paging, search methods, shared static data, etc. to reduce the up-front costs.

Think in terms of buying a house. Most of us don't have the money to pay for the house all up front so we get a mortgage. A mortgage is way of spreading out the up-front cost over time. There's a negative impact that comes with all mortgages called interest. Now, instead of shelling out 100,000 my overall cost becomes 250,000 but because I can afford the current payment, I don't really notice the extra 150,000 because the extra cost is absorbed in small increments over time. Also note that I may not even pay back the entire 250,000 if I sell my house in 5 years instead of staying for life of the loan.

The point here is that you can spread out the cost of making extra connections to retrieve smaller recordsets and still give the user what they need now. This will reduce the overall up-front costs but will add an additional cost to the individual recordsets being retrieved.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. As commented below, what I don't know is how to split easily the "join" result in the different objects (TranslationUnits, properties)
Order by ID and as you loop through the results, keep track of the current ID. When it changes, start a new object.
OK. I just guessed there was an easier (or cleaner) way. That one is clear. I'll test it. Thanks
@DavidD I mocked it up for you. I realize that you'll be returning duplicate data, but creating a new connection and data reader with every row is a lot more costly.
1

Hi i'm going to add my discoveries on this (i'm wokring with David)

I modified the way we read the tables from the db using a buffer as david described so there are no simultaneous connections nor readers executing at the same time. It seems to be a little faster but bearly noticeable. Here are some numbers.

I populate the database (all tables) with 5000 Translation Units in 2.5 seconds. Then when i loop through the TranslationUnit table (around 5000 rows) time of reading is spectacular: 0.07 seconds. Code looks like:

foreach (var tu in document)
{
   ... do something ...
}

If I read the Segments for each Translation Unit like this:

foreach (var tu in document)
{
    foreach (var seg in tu)
    {
        ... do something ...
    }
}

Reading time starts to get ugly: around 10 seconds. Note that each Translation Unit has exactly 2 Segments (although we don't limit this in the design)

For 10000 Translation Units it takes around 6 seconds to populate the database and around 2 minutes to read it. (almost instant if only 1 foreach reading the Translation Units)

For 50000 Translation Units it takes around 32 seconds to populate and i gave up after 1 hour of waiting for the reading to complete. (almost instant if only 1 foreach reading the Translation Units)

So my guess is that reading time cost is exponentially growing. Would it be reasonable to think this is due the fact that it has to change the database pointer to a different table? (between the Translation Units and the Segments tables).

1 Comment

Problem solved. Creating indices for Segments table based on TranslationUnitId did the trick. 11 seconds for reading the 50000 trans lation units and 100000 segments :)
0

Did you try a simple "JOIN"? Or am I missing something in your question?

SELECT tbl2.* 
    FROM tblTranslationUnit tbl1 
    JOIN tblTranslationUnitProperties tbl2 ON tbl2.TranslationUnitId = tbl1.TranslationUnitId 

4 Comments

Thanks. We already tested it using a "JOIN" and it was fast, and it could be the solution for the case I placed. But we have others with more tables, where the object creation based on the date recovered could then be slow. Unless there's an automatic (or semi-automatic) way of doing that
I can't see that case; can you give an example?
If I use SELECT tbl1.* , tbl2.* FROM tblTranslationUnit tbl1 JOIN tblTranslationUnitProperties tbl2 ON tbl2.TranslationUnitId = tbl1.TranslationUnitId to return data form both tables. How can then I easily create the objects? as the rows from first db may appears multiple times
Just a thought: Order the output by the first table-entries and check if you already created that one. But you are right, in that case multiple reads may make sense. You can however reduce the number of sql queries with for example "WHERE TranslationUnitId in (1,2,4,8)" where the numbers can be generated by your first query.
0

First you can write select with join and get everything with one query

SELECT * FROM tblTranslationUnit join tblTranslationUnitProperties on
tblTranslationUnitProperties.TranslationUnitId = tblTranslationUnit.id 
WHERE DocumentId = @docID //<= use parameter

If it not help maybe you need to index your tables.

Comments

0

Read all of the result from the first query at once, close the DataReader and then enumerate the results in memory.

1 Comment

Thanks. This is not feasible, as the first query could be really big and we will have memory isses

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.