Skip to content

Let the server do the job!

I just learned how important the choice of loci for the execution of code is.

Consider a database with the tables company, employee and locks which have to be combined in such a way that the result includes the number of employees per company and whether that company is locked.

If you are familiar with you may think how easy such a task is. You would be correct (as we’ll soon see). But there are several ways of doing things. Let’s consider a client-side implementation:

For this we first retrieve all the companies from the database to iterate over them. For each company we get the matching employees, counting them. Then we look up locks. Not being familiar with the power of SQL one may come up with something like the following pseudocode:

companies = SELECT company_name, company_id FROM company
    for company in companies:
        number_employees = SELECT count(*) FROM employee e WHERE e.company_id = company.company_id
        has_lock = 0 < (SELECT count(*) FROM lock l WHERE l.company_id = company.company_id)

The problem with this solution is the amount of network traffic those statements impose upon the server: 2 * |number of companies| + 1

Network may be slow resulting in poor response times. Furthermore sending statements one at a time hinders the SQL-server from optimizing them.

Let’s consider the following SQL-statement:

SELECT company_name, count(*) AS number_employees, has_lock
FROM company JOIN employee USING company_id
LEFT OUTER JOIN locks USING company_id
GROUP BY company_name

Because company_id is shared by all three tables and we want to perform equi-joins (i.e. matching columns with equal values) it’s possible to use USING instead of ON in the join predicate. Otherwise it would be ON company.company_id = employee.company_id.

The GROUP BY company_name makes the table ‚collapse‘ to just those rows with different company_names. The count(*) AS number_employees returns the number of rows ‚collapsed‘ into a single row, i.e. the number of employess of a single company. Generally with GROUP BY only aggregate functions (like count) or columns appearing in the GROUP BY-clause can be used in the SELECT-list.

The second interesting part of that statement is the LEFT OUTER JOIN between the table resulting in joining companies and employees and the locks-table. Well, not all companies have corresponding entries in the locks-table. Just those that are locked. Therefore a plain (inner) JOIN would disregard the companies that are not locked. And that’s not what we wanted! Using LEFT OUTER JOIN the companies that are not locked stay in the result set.

Performance

So, we have a for-loop and a single(!) SQL-statement to compare. Using 100 companies of which about 5% are locked with 10 employees, what do you think is the gain in performance of the single SQL-statement (it is the single statement that is faster)?

5 times? 10 times? 100?

It’s a bit less than 10 times! The for-loop takes something like 1.5 seconds to complete while the single statement takes 0.19 seconds.

Now imagine some production database. I think the gain will increase a lot because of the network-traffic involved in the round-trips to the server…

Conclusion

Let the server do the work. If it can be expressed in SQL it may very likely be a good idea to do so. BTW that’s what StoredProcedures are all about…

Post a Comment

Your email is never published nor shared. Required fields are marked *