Thursday, April 25, 2013

Post-Redirect-Get Pattern in PHP

As an alternative to writing a final paper for my Information Storage and Retrieval class (too easy!), I've been working on my first database-driven web site. I will link the site when finished, but today I'm presenting the web programming tutorial I wish I would have had a few days ago.

Optimal audience: PHP programmers who want to accept input from web users without risking duplicate input when users refresh their browsers or click 'Back' arrows.

Gettin' and Postin'

Web browsers send more than just the URL when they transmit requests to web servers; a verb is bundled in as well. Usually, this verb is GET. It means: Hey, web server, show me what you have at http://www.timecube.com/

or...

http://en.wikipedia.org/wiki/Cat

or...

https://maps.google.com/maps?q=denver,+co&hl=en&ll=39.751545,-104.985352&spn=0.459278,0.883026&sll=37.0625,-95.677068&sspn=30.323858,56.513672&hnear=Denver,+Colorado&t=m&z=10

Whether simple or complex, GET is about retrieving content from a web server. Best practice is for GET requests to be free of side effects. In particular, GET should not be used to update a web site's database, because programs like Googlebot try to GET everything they can; this can lead to nightmare scenarios for databases affected by GET requests.

POST is another verb (or request method), except this one is about sending content to a web server. If you use any online forums, think of GET as what you use to read posts and POST as what you use to post posts.

The Trouble With Posting

If a user requests the same URL two times or ten times using GET, the only downside is some extra bandwidth usage. Requesting the same URL multiple times with POST can mean sending duplicate information to the web server. This is why some websites warn against extra clicking, or refreshing, or navigating with Back and Forward buttons. It can lead to duplicate forum posts, duplicate user registrations, or duplicate purchases. Not good!

Diagram courtesy of Quilokos.

The fix is to start with a POST to send information to the web server, but end up on a GET: a safe, no-surprises GET. So instead of immediately receiving a confirmation page in response to a POST, the web client receives a redirect response which, in turn, causes the web client to issue a GET to see the confirmation page. (Yes, this is a bit convoluted.)

Diagram courtesy of Quilokos.

This general fix is called the "Post Redirect Get" pattern (or PRG pattern). What tripped me up was how to implement the PRG pattern in PHP. I found parts of a solution here and there, but not a (relatively) simple example all in one place.

A (Relatively) Simple Example All In One Place

Create a file named "echochamber.php" and paste in the following contents (minus the line numbers):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
<?php
    session_start();

    $echoedShout = "";

    if(count($_POST) > 0) {
        $_SESSION['shout'] = $_POST['shout'];

        header("HTTP/1.1 303 See Other");
        header("Location: http://$_SERVER[HTTP_HOST]/echochamber.php");
        die();
    }
    else if (isset($_SESSION['shout'])){
        $echoedShout = $_SESSION['shout'];

        /*
            Put database-affecting code here.
        */

        session_unset();
        session_destroy();
    }
?>

<!DOCTYPE html>
<html>
<head><title>PRG Pattern Demonstration</title>

<body>
    <?php echo "<p>$echoedShout</p>" ?>
    <form action="echochamber.php" method="POST">
        <input type="text" name="shout" value="" />
    </form>
</body>
</html>

Note: Unlike pure HTML/CSS/Javascript test sites, you can't just save this file on your local computer and open files with a web browser. It needs to go on a web server configured for PHP parsing. You can either go through the hassle of configuring your local computer, or use a hosting service (I like nearlyfreespeech.net).

Oh, You Want An Explanation?

PRG can be done with three separate files: the form to fill out, a file that processes filled out forms and gives a redirect response, and a final result page that is the target of redirection. But it's often convenient for the fill-in page and the final-result page to be the same or very similar. Why not stuff everything into one file? At any rate, I'm taking the all-in-one approach in this example. It's less intuitive, but not too bad.

First Load

Suppose a user navigates to http://www.prg-in-php-example.gov/echochamber.php (or whichever domain you're using). The two 'if' statements on lines 6 and 13 will fail. In fact, the big PHP section does nothing significant besides initializing the $echoedShout variable to an empty string. The HTML section is rendered as a simple text input box:


Data Entry

This hypothetical user is a Poe fan, so she types in "Lenore" and hits Enter. Lines 31 and 32 take this input and construct a POST request that includes a variable called "shout" with the contents "Lenore". This POST request is sent back to the web server's echochamber.php file (which happens to the same file in this case). Execution starts again from the top.

On this second time around, the $_POST superglobal tested on line 6 has some content, i.e. the "shout" variable and its associated content "Lenore". Ignore line 7 for just a moment. Lines 9 and 10 respond to the POST request with redirect headers. The user's web browser will receive the redirect headers and start a new GET request for echochamber.php.

Problem! How will this GET request differ from the original GET request? After all, it's not redirecting users to "/echochamber.php?shout=Lenore" or anything that obvious.

The secret sauce is the $_SESSION superglobal. It provides a temporary holding place for this user's data. Line 7 puts the contents of "shout" that came in $_POST into "shout" in $_SESSION so that "Lenore" can survive a trip through a fresh GET. The same principle can work for ten, twenty, or more variables.

Data Display, Finally

Third time around. Second GET. $_SESSION is loaded up.

$echoedShout is once again initialized to an empty string, but won't stay that way for long. This is a GET, so the 'if' statement on line 6 will fail. Line 13's 'if' will succeed because $_SESSION is holding a value for "shout". That value is copied to $echoedShout and then the HTML renders:


Two Ways to Go Wrong

Is all of this complexity really necessary? For instance, why bother with lines 20 and 21's functions session_unset() and session_destroy()? The difference is what happens when a user refreshes a page showing "Lenore" over the blank field.
With session-killer functions: "Lenore" vanishes, and the user sees the original page with a blank field alone and no hidden state in $_POST or $_SESSION.
Without these functions: "Lenore" remains. Any code between lines 13 and 19 will run again with the same $_SESSION values. This can cause duplicate database entry on account of $_SESSION, even if the PRG pattern is preventing duplicate entry on account of POST. 
What happens if we really simplify and leave out the PRG pattern entirely? In other words, what if "echochamber.php" were only:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
<!DOCTYPE html>
<html>
<head><title>PRG Pattern Demonstration</title>

<body>
    <?php echo $_POST['shout'] ?>
    <form action="echochamber.php" method="POST">
        <input type="text" name="shout" value="" />
    </form>
</body>
</html>

At first, it might seem like everything is hunky-dorey, but hit Refresh and you'll see a warning like this:


In other words, refreshing will send a POST request with the same information used earlier (even without using $_SESSION), followed by the loss of color in everything good in the world, culminating in the user being arrested for sharing that one MP3 back in college. Web developers shouldn't subject their users (or databases) to such risks.

4 comments:

  1. Last third of this post has been significantly reworked. Once I put this pattern into practice in a real project, I still had duplicate entries because of leftover $_SESSION values. Fixed.

    ReplyDelete
  2. Hi! Great post. Thanks a lot.

    I'm still kind of confused (probably my brain doesn't like redirects so much as Chrome o Firefox).

    What would happen if lines 16-18 (i.e. "affect the database") would be moved to line 8, and all the "session affaire" would be left off?.

    Something like:
    -----------------------------------------------------------------------

    0) {
    /*
    Put database-affecting code here.
    */
    header("HTTP/1.1 303 See Other");
    header("Location: http://$_SERVER[HTTP_HOST]/echochamber.php");
    die();
    }
    ?>

    HTML the same from here on...

    -----------------------------------------------------------------------

    Wouldn't it kill the POST values with that 303, so when you hit refresh you would issue another simple GET, and not another POST?

    I find that more similar to the graphic you show in your post, as much more simple.

    The first time, count ($_POST) woudl fail, and html would be rendered.

    When we submit the form, Lenore gets saved and the 303 is sent back to the browser, which in turn issues a GET to the echochamber.php. and we're back where we begin.

    I'm i near correct or did i suffered from some brain damage in the reading of your post? :P

    Again, thanks a lot for your writing, and as a plus, for making it fun.

    ReplyDelete
  3. wowww!!! very nice idea and explanation

    Congratulations & greetz!!

    ReplyDelete
  4. Perfect!, nice
    publication
    thanks, for the code!. and the explication. :)

    ReplyDelete