When and Why Did Which Join Join the Where Clause?

Abstract: When and Why Did Which Join Join the Where Clause?

I've been learning a lot about SQL recently. I've never been a big database guru, but I suppose if you work in this industry long enough you end up coming up against almost everything sooner or later.

In the work I've been doing recently, the SQL syntax I've been stumped by most frequently is the subquery. I've come up with some examples to illustrate what I've learned, but every time I sit down to write about them I realize that there are certain issues that I should understand more fully before committing pen to paper, as it were.

In these kinds of investigations the holes in my knowledge seem to surface out of the murk like boats on a foggy evening. For a moment I think I see what it is I need to learn next, and then that idea drifts back into the murk, becoming indistinguishable or invisible.

While doing some reading this morning, however, one subject that has confused me came into sudden focus. For years I have been using where clauses and join clauses to join two tables together. Oddly enough, I had never really asked myself why I used a where clause in one circumstance, and join in the next. (For now, let's ignore the subject of inner and outer joins, and just focus on the difference between where clauses and what is technically called an inner join, but which I will refer to simply as a join.)

Consider the following tables:

create table Cities
(
CityKey int not null,
City varchar(50),
primary key (CityKey)
)
Cities
CityKey City
1 San Francisco
2 Portland
3 Boise
create table ShortNames
(
NamesKey int not null,
FirstName varchar(50),
CityKey int,
primary key (NamesKey),
foreign key (CityKey) references Cities(CityKey)
)
ShortNames
NamesKey Name CityKey
1 'Sue' 1
2 'Lucy' 1
3 'Jeff' 1

Now run the following two queries on them:

select FirstName, City
from ShortNames, Cities
where ShortNames.CityKey = Cities.CityKey


select FirstName, City
from ShortNames inner join Cities
on ShortNames.CityKey = Cities.CityKey

The first query uses a where clause, the second the join syntax. The results for both queries look like this:

FirstName City
Sue San Francisco
Lucy Portland
Jeff Portland

So what is the difference between the two queries? Why are there two different ways of doing the same thing?

The answer, it turns out, lies as answers so often do, shrouded in the mists of time. In particular, it happens that the SQL 1 standard did not support the join, inner join or outer join syntax as we know it to today. Databases provided means of doing inner and outer joins, but there was no formally accepted syntax for it. Instead, there was a babel of competing proprietary syntaxes for solving the need for various types of joins, all of which used the where clause in conjunction with other syntaxes.

When the ANSI 92 specification was developed, it was decided to come up with a new syntax, different from any of the existing proprietary syntaxes. This was necessary, at least in part, to make sure that no previous code would get broken by the introduction of a new syntax. Also, it was decided that the old simple where clause like the example shown above would remain legal, since it was used so widely. But a new syntax, following the second of my two examples, was developed as the proper way to do an inner join on two tables in ANSI 92 syntax. In short, the first example demonstrates the old technique for performing a join, and the second example the new syntax. The confusion comes from the fact that the old syntax, which is very easy to understand, was never designated as obsolete.

Now I will give you a somewhat more complex example which shows the parallel between the two syntaxes. The first example uses a where clause:

select 
VX.MasterID, BC.BrokerCustomerKey, BCodes.BrokerCode, CU.CustomerID
from
ARBrokerCustomers BC, XACustomers CU, ARBrokerCodes BCodes, vXAAddrBookBroker VX
where
VX.AddressBookKey = BCodes.AddressBookKey
and BC.BrokerKey = BCodes.BrokerKey
and BC.CustomerKey = CU.CustomerKey
group by
VX.MasterID, BC.BrokerCustomerKey, BCodes.BrokerCode, CU.CustomerID;

The second example accomplishes exactly the same end, only using the inner join syntax:

select 
VX.MasterID, BC.BrokerCustomerKey, BCodes.BrokerCode, CU.CustomerID
from
vXAAddrBookBroker VX
inner join ARBrokerCodes BCodes on VX.AddressBookKey = BCodes.AddressBookKey
inner join ARBrokerCustomers BC on BCodes.BrokerKey = BC.BrokerKey
inner join XACustomers CU on BC.CustomerKey = CU.CustomerKey
group by
VX.MasterID, BC.BrokerCustomerKey, BCodes.BrokerCode, CU.CustomerID;

Notice that in the second example, all of the where clause and three fourths of the from clause has been replaced with a series of inner joins. Each inner join exactly parallels one section of the old where clause. There is a one to one correspondence between the syntax in each example. One simply joins the first item in the from clause to what had been the second item in the from clause, and then to what had been the third item, etc. The on portion of each inner join simply contains a fragment of code from the original where clause. You might have to study the two code samples for a moment before this will become clear, but I think it is worth contemplating them long enough to see how closely they parallel each other. It is two different ways of saying the same thing, and each produces the same output.

It is important to note that the ANSI 92 syntax ultimately ends up being more powerful than both the original where syntax, and most of the proprietary variations that venders had developed. This power is not evident in these simple examples, but when you want to pursue more esoteric purposes, the new syntax proves its worth. Therefore, it makes sense to use the join syntax (ANSI 92) style, whenever possible, since it lays the foundations for doing some very fancy and useful types of selects and updates.

The final question one might have on this issue would be whether the where clause or the inner join produces faster code. It my opinion, I don't see why it should make any difference which syntax you use when doing an inner join like the one I show here. They are just two ways of doing the same thing, one the old fashioned way, the other the ANSI 92 way. However, it is possible that one syntax would be faster than the other on some databases; the only way to know for sure would be to benchmark the query. But in theory, and hopefully in practice, they should be two ways of doing the same thing. Our preference, however, should be for the ANSI 92 join since it is more flexible.

So now one of the boats that has been sailing around in the SQL fog bank inside my brain has come into focus. This takes me one step closer to being able to write in what I hope is an intelligible fashion about subqueries. That subject would, of course, be a separate topic, and so I will write about it on a later date. An examination of inner and outer joins would also be useful. I'll close by giving you a single listing for my simple first example, in case you want to cut and paste into your database. I've tested the code on MSSQL 2000 and InterBase 7.

/* Create a table called Cities */
create table Cities
(
CityKey int not null,
City varchar(50),
primary key (CityKey)
)

/* Insert data into Cities and show the table */
insert into Cities Values(1, 'San Francisco');
insert into Cities Values(2, 'Portland');
insert into Cities Values(3, 'Boise');
select * from Cities;

/* Create a table called Names */
create table ShortNames
(
NamesKey int not null,
FirstName varchar(50),
CityKey int,
primary key (NamesKey),
foreign key (CityKey) references Cities(CityKey)
)

/* Insert records into Names, and show the records */
insert into ShortNames Values (1, 'Sue', 1);
insert into ShortNames Values (2, 'Lucy', 2);
insert into ShortNames Values (3, 'Jeff', 2);
select * from ShortNames;

select FirstName, City
from ShortNames, Cities
where ShortNames.CityKey = Cities.CityKey

select FirstName, City
from ShortNames inner join Cities
on ShortNames.CityKey = Cities.CityKey

drop table ShortNames;
drop table Cities;