2006-06-06, Nicholas 'OwlManAtt' Evans
Welcome to my introductory paper on SQL. Before I begin to explain SQL itself, I would first like to touch on what SQL is.
SQL, or 'Structured Query Language', is a language associated with relational databases, such as Oracle, Postgres, or MySQL. SEQUEL was originally developed by IBM in the 1970s, but the name was later condensed to 'SQL' due to a trademark dispute. In 1986, SQL was adopted by ANSI as a standard, and a year later, was ratified by the ISO. The proper pronunciation for 'SQL', as defined by the ANSI standard, is ess kyoo ell.
I can already hear you thinking, "that's all well and good, but really, what does one use SQL for?" Well, when you have a relational database, you use SQL to manipulate it and retrieve data from it. To insert new data, you use SQL. To retrieve data, you use SQL. To create a new table to store data in, you use SQL. It's really that simple.
There are several aspects of SQL. However, for the sake of portability, this tutorial will only focus on adding, modifying, and retrieving data, and not manipulating the structures that data is stored in. The SQL that modifies the database structures varies greatly between SQL implementations. This tutorial will also be focusing on the SQL implementation in MySQL, but most of what you learn applies in all relational databases.
If you want to design your own MySQL database, install PhpMyAdmin and use that.
Now, on to the SQL!
There are four 'verbs' in SQL, SELECT, INSERT, UPDATE, and DELETE. Select is for retrieving data, insert is for adding data, update is for modifying data, and delete is for removing data. There are many other keywords that you can use to do more complex things. But before we begin exploring SQL, I need to show you how a database is structured.
First of all, a database is just a collection of tables. A table is a collection of columns. These columns can be set to only hold specific types of values, like integers, strings, or booleans. Below is an example of a table:
| users | |||
| id | username | password | joindate |
| 1 | OwlMan | Dog | 1121989668 |
| 2 | Tigress | Cat | 1122294523 |
What are you seeing, exactly? It's a table with four columns and two rows of data in it. The columns are called id, username, password, and joindate. What you do not see in my example are the settings for the columns. The 'id' column (also known as a field) can only hold an unsigned integer (meaning, a positive number), and it has to be unique. The id column is also set to index, which means that MySQLd will keep a table of hashed id values handy, which in turn
speeds up selecting data.An interesting note about indexing: When you are inserting data into MySQL, it will insert one row, calculate the hash for the indexes, and then repeat the process. If you happen to be inserting ten thousand rows, the insertion process will be slowed by this. If you remove the indexes on the table before inserting, and then put them back on when you are done, MySQL will add all of the rows, and then calculate the index hashes at once, increasing your speed.
Now, if a column is set to be unique, the database will not allow you to have two records with the same value for that field. So if I tried to set Tigress' ID to be one, the database would scream and shout about how stupid I was being.
The username and password fields are set to varchars, meaning, they can hold strings. The joindate field is set to be an unsigned integer, because that is all a UNIX timestamp can be.
Now, say we wanted to retrieve all of the records in our users table? We could do this:
SELECT * FROM users;
That line right above this is a basic SQL query. It will select all columns (that's what the star if for) from the table users. If we only wanted to grab the username for every user, we could do this:
SELECT username FROM users;
You can, of course, select multiple columns. Just seperate the column names with a comma, like this:
SELECT username, password FROM users;
Easy, right? Ah, but what if we only wanted users with the username 'OwlManAtt'?
SELECT * FROM users WHERE username = 'OwlManAtt';
That query would only return rows where the username was OwlManAtt. Note the single quotes around the string; if that were an integer you were specifying, you would not need the quotes, but it would still be a good idea to put them in.
You can have multiple conditions in your WHERE clause. Just slap an AND between them.
SELECT * FROM users WHERE username = 'OwlManAtt' and id = '5';
That would look for a user called 'OwlManAtt' with an id of five. That would, of course, return no results, because OwlManAtt has an id of one, but you get the picture.
Furthermore, SQL has more than just the = logic operator for comparing things. You can use >, <, >=, <=, or <>, which is not equal to. For example:
SELECT * FROM users WHERE id <> 1 AND id < 5;
And we'd get back users with the ids of two through four. If you wanted to compare two fields in your where clause, you'd just use those in place of a string or integer.
SELECT * FROM users WHERE id = password;
If the user's password happened to match their ID, this query would show us that user.
Say now that we have two tables, our users table from the previous example, and a table called users_pets. The users_pets table contains its own unique id, userid, and pet_name. If we wanted to connect each user to their pet (and for our purposes, each user will only have one pet), we would have to use a JOIN statement. We also need to be more specific when we chose what columns to select.
SELECT users.username, user_pets.pet_name FROM users INNER JOIN user_pets ON users.userid = user_pet.userid;
This query here would give us back the user and pet name that goes with each user, like so:
| username | pet_same |
| OwlManAtt | Penguin |
| Tigress | Snake |
Now, the query probably confuses and intimidates you. There's so much more to it, but really, it's nothing. You see, when we are joining, we deal with multiple tables. Due to this, it is best to specify which table, then which column you are talking about. So instead of selecting 'username', you'd select 'users.username', which just said you want the username field from the users table.
You can get by without stating the table you want a column from only if that is the only table that has that particular column name. Like password in the users table. Since it only exists in the users table, the query would not be confused. However, if we later added a password field to the user_pets table, the query would break.
Now that I have explained that, the next strange part of the query: the line that starts with INNER JOIN.
Inner join is one of three types of JOIN statements in MySQL (the other two are LEFT JOIN and RIGHT JOIN, more on those later). Inner join will join two tables together, matching each row with the corresponding row in the other table. In my example, we are joining the tables together based on the userids. Remember this?
INNER JOIN user_pets ON users.userid = user_pet.userid
Well, it says this: Combine rows in the table we selected from (users) where the userid field in the users table is equal to the value of the userid field in the user_pets table. The order of the condition after the ON statement doesn't matter, meaning you can do user_pets.userid = users.userid if you wish. It is best to put the column from the table that your 'FROM' specified, and THEN the column from the table that you are INNER JOINing to, but it does not matter. Just try to remain consistent in your style.
Now, what if OwlManAtt does not have a pet? Our query would just return Tigress and her snake. If we wanted to show all of the users and then optionally their pets, this would not be good. Inner join REQUIRES a row to match another row for it to be returned. So if you want to select OwlManAtt and get Tigress' pet, you would need to do a LEFT join.
SELECT users.username, user_pets.pet_name FROM users LEFT JOIN user_pets ON users.userid = users_pets.userid;
As you can see, the syntax for the LEFT join is pretty much the same as the syntax for the INNER. This query would return the following:
| username | pet_same |
| OwlManAtt | null |
| Tigress | Snake |
RIGHT joins are the opposite of LEFT joins. They'll connect to rows that do not match the condition:
SELECT
users.username,
user_pets.pet_name
FROM users
RIGHT JOIN user_pets ON users.userid = user_pets.userid
| pet_same | username |
| null | OwlManAtt |
What if you wanted to select the usernames and pets for the people who only had snakes? You'd just tack a where clause on to your SQL and be done with it.
SELECT users.username, user_pets.pet_name FROM users INNER JOIN user_pets ON users.userid = user_pets.userid WHERE user_pets.pet_name = 'Snake';
If you so desire, you may also join to more than one table with more than one JOIN type. Just put a block of JOINS without comma seperation. Something like this:
SELECT users.username, user_pets.pet_name FROM users INNER JOIN user_pets ON users.userid = user_pets.userid INNER JOIN pets_food ON user_pets.pet_name = pets_food.fooder_pets.userid WHERE user_pets.pet_name = 'Snake';
And you'd get all of the users with a pet called 'Snake', and whatever 'Snake' eats.
Say there were three hundred users with snakes, and you only wanted to see one of them? You could select everything and ignore all but one set of data that the query returns, but that would be a poor use of system resources. You would use a LIMIT statement at the end of your query, like this.
SELECT users.username, user_pets.pet_name FROM users INNER JOIN user_pets ON users.userid = user_pets.userid LIMIT 1;
And boom, you'd get the very first result back. Limit will take more than one number after it, which is the basis for all of the pagination you see on websites. If you want to see all of the results between 30 and 60, you would just have to say LIMIT 30,60. Nifty, yes?
Now, say you wanted to see the newest user? You could always select every user and use some programming language to automate the sorting process, but that would be too much work. You can just use the ORDER BY statement.
SELECT username, FROM users ORDER BY joindate DESC LIMIT 1;
That query would return the newest user's username. You can also sort by multiple columns, like so:
ORDER BY joindate DESC, username ASC
There is no limit on how many fields you can sort by. The DESC in the ORDER statement stands for descending, and the ASC is ascending. Easy stuff, right?
Now, say we have two users with the same username, and some PHP script would barf all over your screen if it noticed this. You could use UNIQUE in your query (or fix your damned script) to get only one of these two usernames. And yes, this is a piss-poor story for using UNQIUE, but I'm running out of scenarios for my example tables.
SELECT UNIQUE(username), joindate FROM users LIMIT 10;
Right. Well, I have just hit the roof of my knowledge of statements that are useful in a SELECT query. There are more statements you can use, such as HAVING and IN, but if you want to learn about them, you're going to have to go elsewhere.
So since I'm out of statements for you, why don't we talk about a few of MySQL's build-in functions?
MySQL has a number of handy functions. I will only be discussing functions that I have used and found particularly useful. You can check out the MySQL Freaks website for a more comprehensive look at the functions, if you are so inclined.
Firstly, we have the king of functions, COUNT(). Count counts the number of rows you'll get back with a query, but it won't actually return any data. You can use it like this:
SELECT count(*) FROM users;
If there were 15 rows in the users table, you would get the number 15 back from that query. You can also use this to select data and a count.
SELECT count(comments.cid) as comment_count, news.content FROM news LEFT JOIN comment_count ON news.id = comment.news_id GROUP BY comments.cid ORDER BY news.time DESC;
There are two new concepts here: as and group by. When we do a count statement mixed in with data, it's always helpful to give it a better name than count(whatever you are counting), which is where as comes in to play. In MySQL, you can store a calculated value with as. So count(comments.cid) as comment_count will count all of the comments associated with our news post, and then give that number the name comment_count.
We also have a GROUP BY statement. GROUP BY is used when you mix COUNT, MIN, MAX and/or normal data. I'm not exactly sure of why you need it, but rest assured: when mixing these things, you will need GROUP BY.
The final function I will cover in this section is FROM_UNIXTIME. This function will take a UNIX timestamp and format it out into a human-readable datestamp. It works like this:
SELECT FROM_UNIXTIME(time,'%Y.%m.%d %H:%i:%S') as time, username FROM users;
This would return YYYY.MM.DD HH:MM:SS and the username. You can find a listing of all of the variables you can use on MySQL Freaks.
Inserting data into a table in a painless thing in MySQL. As a result, this section is going to be significantly shorter than the previous one.
The syntax for inserting a row is this:
INSERT INTO users (username,password,time) VALUES ('Owl','P4SS',UNIX_TIMESTAMP());
Very easy to understand, right? We have INSERT INTO, then a table to insert a new row in to. The first set of values in parenthesis are a set of column names. Then comes VALUES, and another set of values in parenthesis, this time values to insert into the columns. The order of the values much match the order of the columns, otherwise you will insert bullcrap.
Also, there is a function in that query, UNIX_TIMESTAMP(). This merely puts in the current UNIX timestamp as the value for time.
It is possible to not specify column names, but I will advise you to UNDER NO CIRCUMSTANCES EVER DO THIS. The possibility of having issues with a query like that is just to great for not having to type a few extra characters.
If you happen to know that you will be inserting a blank value into column, then don't bother putting it in the column list. If you do not specify it, it will fall back to the default for that column, and unless you have set the default to be a non-zero or non-blank value, it will just set that to nothing.
For an auto_increment field, you do not need to specify anything in the insert statement. They will always be last id plus one, even if you do not put them in the column list.
Finally, you can use data already in another table for your insert statement by using a SELECT in your INSERT.
INSERT INTO item (item_name,item_descr) SELECT item_type_descr, item_class FROM item_type;
You can see that the syntax is a little different; there is no 'VALUES' keyword.
Updating is slightly more interesting than inserting is, but with my current level of understanding, it isn't nearly as fun as selecting. While I know it is possible to update (and delete) across multiple tables ala JOIN, I do not know how to do it.
I do, however, know how to update things in one table. The basic syntax:
UPDATE users SET username = 'OwlRocks';
This simple query would set every username in the users table to OwlRocks. You can add a WHERE, an ORDER and/or a LIMIT statement to refine what to update. For example, if I just wanted to update the newest user's username and password:
UPDATE users SET username = 'NewUser', password = 'Lame!' ORDER BY joindate DESC LIMIT 1;
As you see, this query updates multiple columns at once. You can update as many columns as you like as long as you keep them comma separated like that.
MySQL also supports on-the-fly mathematical operations, allowing you do something like this:
UPDATE users SET joindate = joindate*5 LIMIT 1;
That query would change joindate to the current joindate value times five. You can also just do two other values:
UPDATE users SET joindate = 1+2 LIMIT 1;
Which would set joindate to three for one user.
And there you have it, updating rows.
Deleting data is not a complex process. In many respects, it is like updating data; you can use WHERE, ORDER BY, LIMIT, etc. Again, you can also delete across multiple tables, but I do not know how to do this.
The syntax for a delete query is a bit more simplistic than that of the update query:
DELETE FROM users;
The above query deletes every user. We could add a limit, delete only a specific ID, a range of IDs, or many other things. For example:
DELETE FROM users WHERE userid > 20 AND userid < 50;
That query would delete the rows for users 21 through 49. In MySQL, when you delete a row, it's gone. In Oracle, you would have to commit any changes before they actually deleted or updated rows. So be very careful when deleting things. If you know you will only be deleting a finite number of rows, put a LIMIT statement on to the end of your query!
And that is about it for deleting.
Well, that's as far as my knowledge of SQL will take you this time. In a few months, I will try to release an addendum to cover a few more topics that I am not yet familiar with. Still, though, you have a lot to think about.
I can do some very interesting things with SQL, even at my beginner's level. The queries to do these interesting things can get very long and confusing, so I would like to quickly run through some basic guidelines on SQL syntax in general.
SQL does not mind linebreaks, as far as I can tell. This does not give you free reign to slap a linebreak in the middle of a function name, but you can make your queries fit on one screen, like I have done. Your whitespace and tabbing for a select statement does not need to echo mine; your queries could be all on one line. That would be hard to read, though, so you probably won't do that.
A note on the '*' in the column list is required. I am told that using it is bad practice for queries that you will be using long-term. For example, if you are putting a SQL query into a PHP script, you would not want to use SELECT *. Later on, database changes could cause problems with your script because it is expecting one thing and getting something else. These errors would be logic errors, and very hard to debug, as opposed to a missing column, which would throw up an error message and tell you where the problem existed.
Thanks goes out to John Hirbour (john@hirbour.org), who has taught me plenty about SQL. Also, some credit must be given to Uberuser, for without him, I never would have written this document.
For more information about MySQL, you should refer to an excellent community site, MySQLFreaks.com. You can also check out the MySQL documentation, but I find its search function to be sub-par, and the documentation it self to be ...lacking.
This tutorial is copyright 2005-2006, Nicholas 'OwlManAtt' Evans. You may redistribute this tutorial, in its entirety or in portions, provided that you do not change its text. This legal notice must also be included if you reproduce just a portion or portions.
To contact me, please see http://owlmanatt.com/contact/.
This has been a Yasashii Syndicate production.