0

This is more of a theoretical query than anything else, but I have a complex join (resulting in upwards of 1900 records in the main table, combined with all the sub-result tables in the join -- join shown below), the resulting web page is taking 5-10 minutes on my local machine to process and complete building. I realize this could easily be many factors, but am hoping to get some hints. Basically I am loading an array of names from two tables (one is cross-references, so the array is used to sort the data on the names, with links and a field noting if it is a cross reference), then if a name is not a cross reference, I issue this join:

select
  n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
  n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio,
  n.HeadShotPhoto, n.HeadShotPhotographer, n.HeadShotContributor,
  x.NameCode, x.NameAKA, x.AlternateName,
  g.NameLink, g.`Group Name`,
  p.NameLink, p.`Relationship Type`, p.`Related To Link`,
  p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`,
  p2.`Date Started`, p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink,
  p2.`Screentip Text`,
  a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
  a.ExternalLink
from who_names as n
left outer join who_crossref as x on n.NameCode=x.NameCode
left outer join who_groups as g on n.NameCode=g.NameLink
left outer join who_personal as p on n.NameCode=p.NameLink
left outer join who_positions as p2 on n.NameCode=p2.NameLink
left outer join who_arts as a on n.NameCode=a.`Name Link`
where n.NameCode = ?
order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate;

In order to output the various parts of the data, I:

1) Start a table, 2) Output the name and some other info in the first row, 3) Then in order to process, say, the groups (sub-groups someone associates themselves with within the organization), I issue:

mysqli_data_seek( $result, 0 ); // to rewind to top of data so we're at first row

and see if there's anything to process for subgroups (not everyone has anything ...), 4) I repeat for personal relationships, and other sections, going back to the top of the data and looping back through if there's anything to process.

When done with that individual, I close off the table, and loop back in the array to the next name, and repeat ...

While this works, 5-10 minutes is way to long to load a web page.

I am pondering ideas to resolve this, but I am not sure if it is any specific aspect of my code. Is it the seeks back to the top of the rowset returned? Is it the tables in the browser? Is it a combination of both (very possibly)? The program is too big to post here in its entirety. I am feeling rather flummoxed at how to resolve this, and hoping someone has some pointers to help me speed the processing up, and I hope the details I've given are enough to give something to work with.

Based on comments and feedback below, in PHP Admin, I did the following:

explain select n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
                     n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio, n.HeadShotPhoto,
                     n.HeadShotPhotographer, n.HeadShotContributor,
                     x.NameCode, x.NameAKA, x.AlternateName,
                     g.NameLink, g.`Group Name`,
                     p.NameLink, p.`Relationship Type`, p.`Related To Link`,
                     p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`, p2.`Date Started`,
                     p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink, p2.`Screentip Text`,
                     a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
                     a.ExternalLink
                     from who_names as n
                     left outer join who_crossref as x on n.NameCode=x.NameCode
                     left outer join who_groups as g on n.NameCode=g.NameLink
                     left outer join who_personal as p on n.NameCode=p.NameLink
                     left outer join who_positions as p2 on n.NameCode=p2.NameLink
                     left outer join who_arts as a on n.NameCode=a.`Name Link`
                     where n.NameCode=638
                     order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate

This returned:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   SIMPLE  n   const   PRIMARY,ix1_names   PRIMARY     4   const   1   Using temporary; Using filesort
1   SIMPLE  x   ref     ix2_crossref    ix2_crossref    4   const   1   NULL
1   SIMPLE  g   ref     ix3_groups  ix3_groups  4   const   3   NULL
1   SIMPLE  p   ref     ix4_personal    ix4_personal    4   const   1   NULL
1   SIMPLE  p2  ref     ix5_positions   ix5_positions   4   const   13  NULL
1   SIMPLE  a   ref     ix6_arts    ix6_arts    4   const   28  NULL

Which appears to just be a list of the indexes, so it doesn't seem to be helping me.

5
  • 1
    Questions about performance are not theoretical and -- even more than other questions -- need a tag for a specific database. Commented Aug 3, 2018 at 14:03
  • In case the join is that slow, you should check, if each field of the join condition has a index on that field Commented Aug 3, 2018 at 14:09
  • I don't see any particular reason why your query should be slow. Perhaps a slow computer? Huge data set? Other than that, I would concentrate on checking the indexes, like Philipp already suggested. Commented Aug 3, 2018 at 14:11
  • Gordon: I did tag MySQL. Philip -- I don't have indexes set at all. Up until now I haven't really needed them with my apps. Commented Aug 3, 2018 at 14:18
  • 1
    Well, now you certainly need indexes... they will do wonders! Commented Aug 3, 2018 at 14:33

3 Answers 3

1

Since you are using a SINGLE main table and the rest of the joins are all OUTER JOIN there's a single most important index that can make your query faster:

create index ix1_names on who_names (NameCode, Name);

Also, the Nested Loop Joins (NLJ) against the related tables will benefit of the following indexes. You may already have several of these so check if you have them first. If you don't, then create them:

create index ix2_crossref on who_crossref (NameCode);
create index ix3_groups on who_groups (NameLink);
create index ix4_personal on who_personal (NameLink);
create index ix5_positions  on who_positions (NameLink);
create index ix6_arts on who_arts (`Name Link`);

But again, it's the first one the one I consider the most important one.

You'll need to test for real to see if the performance improves with it/them.

If the query is still slow, please retrieve the execution plan, as @memo suggested, by using:

explain select ...
Sign up to request clarification or add additional context in comments.

13 Comments

I am not sure why the sequence in the first index listed ...? I would have thought that "Name" would need to be the first field? I could be wrong of course, I just don't understand.
In terms of query optimization, we need to differentiate two aspects: "access" and "filtering". Access corresponds to the rows you inspect, and to optimize the query we want to access the least possible number of rows. Second, we need to apply filters, that will discard rows that do not meet the criteria. In your case the access should be by "NameCode". Second, your query has no filtering. Finally the sorting is by "Name", and this only comes in third place.
Hmm. Okay. I think I need to try the "explain" option, it is still taking a very long time even with all the indexes.
Tried the "explain" option in PHP Admin, substituting the variable for namecode for existing value, and just got a list of the indexes being used. It wasn't very helpful.
Can you post the explain plan? In any case 1900 rows is not that much, if read using an index. How many rows does the whole main table have?
|
0

First, try removing the "order by" clause and see if that improves anything. Sometimes it can happen that the query itself is fast, but the re-ordering is slow, requiring temporary files.

Second, feed the query to an EXPLAIN statement (e.g. EXPLAIN SELECT whathaveyou FROM table...). Check out the output for bottlenecks, missing indexes etc. (https://dev.mysql.com/doc/refman/8.0/en/using-explain.html)

3 Comments

I took out the order by, and it didn't seem to change things much, with the indexes added from The Impaler's post, it's still about 4 minutes to load the page. See original post (edited), I used "EXPLAIN" and posted what I did and the results.
@KenMayer "Using temporary; Using filesort" is what kills performance, you should try and investigate why.
Honestly, I have no idea what "filesort" and such even means.
0

After a lot of work, I found a few issues that I was able to resolve: I was (thinking it made sense at the time) opening some tables when they weren't necessary to get row counts; I dropped the big join and just opened the sub-tables as needed; cleaned up a few other places in the code; added a few more indexes on another set of tables that weren't in the original join. I was able to reduce the speed from 4 minutes to 45 seconds. While 45 seconds is a long time to load a page, I figure since this page was handling up to 1500 (sometimes more) primary records, and pulling data from up to 10 different tables, formatting (tables inside tables, etc.), that 45 seconds is probably doable, with a note at the top of the page and a progress bar that displays while loading the page. Thanks, all. The indexes did help, and the other explanations also helped a lot.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.