As an alternative to writing a final paper for my Information Storage and Retrieval class (too easy!), I've been working on my first database-driven web site. I will link the site when finished, but today I'm presenting the web programming tutorial I wish I would have had a few days ago.
Optimal audience:
PHP programmers who want to accept input from web users
without risking duplicate input when users refresh their browsers or click 'Back' arrows.
Gettin' and Postin'
Web browsers send more than just the URL when they transmit requests to web servers; a
verb is bundled in as well. Usually, this verb is GET. It means: Hey, web server, show me what you have at
http://www.timecube.com/
or...
http://en.wikipedia.org/wiki/Cat
or...
https://maps.google.com/maps?q=denver,+co&hl=en&ll=39.751545,-104.985352&spn=0.459278,0.883026&sll=37.0625,-95.677068&sspn=30.323858,56.513672&hnear=Denver,+Colorado&t=m&z=10
Whether simple or complex, GET is about
retrieving content from a web server. Best practice is for GET requests to be free of
side effects. In particular, GET should not be used to update a web site's database, because programs like
Googlebot try to GET everything they can; this can lead to
nightmare scenarios for databases affected by GET requests.
POST is another verb (or
request method), except this one is about
sending content to a web server. If you use any online forums, think of GET as what you use to read posts and POST as what you use to post posts.
The Trouble With Posting
If a user requests the same URL two times or ten times using GET, the only downside is some extra bandwidth usage. Requesting the same URL multiple times with POST can mean sending duplicate information to the web server. This is why some websites warn against extra clicking, or refreshing, or navigating with Back and Forward buttons. It can lead to duplicate forum posts, duplicate user registrations, or duplicate
purchases. Not good!
The fix is to start with a POST to send information to the web server, but end up on a GET: a safe, no-surprises GET. So instead of immediately receiving a confirmation page in response to a POST, the web client receives a
redirect response which, in turn, causes the web client to issue a GET to see the confirmation page. (Yes, this
is a bit convoluted.)
This general fix is called the
"Post Redirect Get" pattern (or PRG pattern). What tripped me up was how to implement the PRG pattern in PHP. I found parts of a solution here and there, but not a (relatively) simple example all in one place.
A (Relatively) Simple Example All In One Place
Create a file named "echochamber.php" and paste in the following contents (minus the line numbers):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| <?php
session_start();
$echoedShout = "";
if(count($_POST) > 0) {
$_SESSION['shout'] = $_POST['shout'];
header("HTTP/1.1 303 See Other");
header("Location: http://$_SERVER[HTTP_HOST]/echochamber.php");
die();
}
else if (isset($_SESSION['shout'])){
$echoedShout = $_SESSION['shout'];
/*
Put database-affecting code here.
*/
session_unset();
session_destroy();
}
?>
<!DOCTYPE html>
<html>
<head><title>PRG Pattern Demonstration</title>
<body>
<?php echo "<p>$echoedShout</p>" ?>
<form action="echochamber.php" method="POST">
<input type="text" name="shout" value="" />
</form>
</body>
</html>
|
Note: Unlike pure HTML/CSS/Javascript test sites, you can't just save this file on your local computer and open files with a web browser. It needs to go on a web server configured for PHP parsing. You can either go through the hassle of configuring your local computer, or use a hosting service (I like
nearlyfreespeech.net).
Oh, You Want An Explanation?
PRG
can be done with three separate files: the form to fill out, a file that processes filled out forms and gives a redirect response, and a final result page that is the target of redirection. But it's often convenient for the fill-in page and the final-result page to be the same or very similar. Why not stuff everything into one file? At any rate, I'm taking the all-in-one approach in this example. It's less intuitive, but not too bad.
First Load
Suppose a user navigates to http://www.prg-in-php-example.gov/echochamber.php (or whichever domain you're using). The two 'if' statements on lines 6 and 13 will fail. In fact, the big PHP section does nothing significant besides initializing the $echoedShout variable to an empty string. The HTML section is rendered as a simple text input box:
Data Entry
This hypothetical user is a Poe fan, so she types in "Lenore" and hits Enter. Lines 31 and 32 take this input and construct a POST request that includes a variable called "shout" with the contents "Lenore". This POST request is sent back to the web server's echochamber.php file (which happens to the same file in this case). Execution starts again from the top.
On this second time around, the
$_POST superglobal tested on line 6 has some content, i.e. the "shout" variable and its associated content "Lenore". Ignore line 7 for just a moment. Lines 9 and 10 respond to the POST request with redirect headers. The user's web browser will receive the redirect headers and start a
new GET request for echochamber.php.
Problem! How will this GET request differ from the original GET request? After all, it's not redirecting users to "/echochamber.php?shout=Lenore" or anything that obvious.
The secret sauce is the
$_SESSION superglobal. It provides a temporary holding place for this user's data. Line 7 puts the contents of "shout" that came in $_POST into "shout" in $_SESSION so that "Lenore" can survive a trip through a fresh GET. The same principle can work for ten, twenty, or more variables.
Data Display, Finally
Third time around. Second GET. $_SESSION is loaded up.
$echoedShout is once again initialized to an empty string, but won't stay that way for long. This is a GET, so the 'if' statement on line 6 will fail. Line 13's 'if' will succeed because $_SESSION
is holding a value for "shout". That value is copied to $echoedShout and then the HTML renders:
Two Ways to Go Wrong
Is all of this complexity really necessary? For instance, why bother with lines 20 and 21's functions
session_unset() and
session_destroy()? The difference is what happens when a user refreshes a page showing "Lenore" over the blank field.
With session-killer functions: "Lenore" vanishes, and the user sees the original page with a blank field alone and no hidden state in $_POST or $_SESSION.
Without these functions: "Lenore" remains. Any code between lines 13 and 19 will run again with the same $_SESSION values. This can cause duplicate database entry on account of $_SESSION, even if the PRG pattern is preventing duplicate entry on account of POST.
What happens if we really simplify and leave out the PRG pattern entirely? In other words, what if "echochamber.php" were only:
1
2
3
4
5
6
7
8
9
10
11
| <!DOCTYPE html>
<html>
<head><title>PRG Pattern Demonstration</title>
<body>
<?php echo $_POST['shout'] ?>
<form action="echochamber.php" method="POST">
<input type="text" name="shout" value="" />
</form>
</body>
</html>
|
At first, it might seem like everything is hunky-dorey, but hit Refresh and you'll see a warning like this:
In other words, refreshing will send a POST request with the same information used earlier (even without using $_SESSION), followed by the loss of color in everything good in the world, culminating in the user being arrested for sharing that one MP3 back in college. Web developers shouldn't subject their users (or databases) to such risks.