Writing a PHP polyglot

Introduction

Sometimes, it is very useful to run a PHP program from the command line. When you want to try out a small feature; when batch language just doesn’t do it; or when some of PHP’s powerful library functions (like fopen()) come in handy. However, running a small program like that is always a bit clumsy. First of all, PHP is usually not in your path; and wouldn’t be nice if you could just type the name of the program and run it?
Take this simple PHP script:

<?php
print date('r', $argv[1]) . "\n";
?>

If you ever store dates as a time(), i.e. the number of seconds since 1 January, 1970, you know how useful this is. You can call it like this:

c:\php\php.exe -q time2str.php 1234567890

and it will answer with

Sat, 14 Feb 2009 00:31:30 +0100

Indeed, Valentine’s day in 2009.
But the call you have to make is pretty cumbersome. So you write a batch file, time2str.bat, that looks like this:

@c:\php\php.exe -q time2str.php %1

and now you can just type

time2str 1234567890

to get your result. You stuff both files somewhere in your path, and when you remember about it a few days later, you type:

time2str 1234567890
Could not open input file: time2str.php

Hmm. Of course you are running from another directory now. So now you should hardcode time2str.php’s directory. Or time2str.bat should be smart enough to find out its own directory, and then find the PHP file in it. This batch file starts to become a nuisance.

The Unix shell can do it

In Unix and Linux, there is a good solution for this. Any script, being shell script, Perl, PHP or any other language, will be started by the shell; and any script can have a very special line in the beginning, that looks e.g. like this:

#! /usr/bin/perl -w

The #! means so much as: ‘use the following command to run this script’. And then the name of the program to use follows. Now the shell knows what to do with this program, and does that instead of trying to interpret the file itself.
Now that would be nice on Windows! Unfortunately, it doesn’t exists. What does exist is the ‘executable extension’: if you double-click on a file called something.doc, the application handling DOC files is started. What not everyone knows, is that this also means that if you type something.doc at the command line, that application is started. But to enable this, you have to change global settings on your system, and in case of PHP, there is a clear drawback: if you now double-click a PHP file, it will be executed in a DOS window that disappears immediately after the program ends. It makes more sense to associate PHP files with your favourite editor!

… but a polyglot can, too

The solution presented here was inspired by Perl, for which a similar trick trick exists to make Perl scripts execute on Windows. The idea is to turn the PHP file into a batch file that runs PHP on itself. But that is easier said than done…
Here’s an outline of the task. We want a PHP program to run from the command line. To achieve this, we turn it into a batch file. That batch file contains both instructions in batch file language to run PHP, and the actual PHP program that is being executed. The two problems are that PHP should completely ignore the batch instructions, whereas the batch interpreter, CMD, should ignore the PHP code. A program that runs in two languages is usually called a polyglot; strictly speaking, a polyglot should do the same in both languages, but this is a more practical example.

Let’s just start somewhere

Here’s the first take. Let’s create a batch file called str2time.bat that contains the following:

@c:\php\php.exe -q %~f0 %*
@goto :EOF
<?php
 print date('r', $argv[1]) . "\n";
?>

The first line runs PHP (or more precisely, c:\php\php.exe) on the file indicated by %~f0. This is batch-speak to point to the full path of %0, and %0 is the command itself. So %~f0 expands to the full path of the batch file time2str.bat itself. %* stands for all command line parameters except %0. They will be passed to PHP, which will handle them.
The second line skips to the end of the file. the label :EOF is always implicitly present in batch files. This makes sure the rest of the file, in this case the PHP code, is completely skipped bij CMD. A feature of CMD that is in this case helpful is that it will not read the skipped lines at all. The batch file will run without error messages, even though these lines are not valid batch syntax.
The @-characters in the first two lines make sure that those lines are not echoed by CMD before they are executed. The same can be achieved by starting the batch file with @echo off, but this is shorter.
Then follows the PHP code, but the batch interpreter blindly ignores all that; it just jumps to the end of the file and exits. So half of the task is done!
If you run this, you will see the following:

@c:\php\php.exe -q %~f0 %*
@goto :EOF
Sat, 14 Feb 2009 00:31:30 +0100

Here the burden of the polyglot programmer shows: who produced this output? PHP or CMD? In this case, it is pretty easy to sort out: it was PHP. In PHP, everything that appears outside <?php and ?> will be printed literally. So the first to lines of our PHP script are the first line of our output. The the rest is executed, and the correct answer is printed: Valentine’s day.

‘Shut up, PHP!’

So now the task at hand is this: make sure the file starts with <?php. How about:

<?php /*
@c:\php\php.exe -q %~f0 %*
@goto :EOF
*/ ?>
<?php
 print date('r', $argv[1]) . "\n";
?>

What is done here is twofold: first of all, we start the file with a PHP opening tag, to make sure the batch lines are not printed; and secondly, we place comment brackets around the batch lines, to make sure PHP doesn’t try to execute them! So we now have hidden the two lines of batch instructions from PHP. Let’s try; do you feel what’s coming?

/*0<?php
The filename, directory name, or volume label syntax is incorrect.
Sat, 14 Feb 2009 00:31:30 +0100

Whoops. What is this? And again: Who produced this output?

‘… and you too, CMD!’

This time, it’s the batch interpreter that is complaining. First of all, it echoes the first command it encounters in the file:

/*0<?php

What it really wanted to say was this:

/* 0< ?php

which is the long version of:

/* < ?php

What you see here is a redirected input command, much like e.g.

sort < records.txt

In CMD, you are allowed to reverse the order of the parts, and write

< records.txt sort

To be helpful, when echoing the command, CMD will put it in the right order. In our polyglot, we see that CMD actually tries to execute the command /* and uses ?php as input.
After executing this command, CMD produces an error message:

The filename, directory name, or volume label syntax is incorrect.

That’s the second line of the output. The reason for this error message is that the filename ?php is not valid for redirecting: it contains a wildcard (?), and CMD cannot handle that.
The last line is our output from PHP, which is just how we want it. So now the PHP problems are solved, but we have two new, batch-related problems: the first line is now echoed by CMD, and produces an error when executing because the syntax is incorrect. How do we hide this line from CMD?
Polyglot programming is a bit like running in circles; every time you solve a problem in one languages, you create a new problem in the other language. Luckily, the problems are becoming smaller in the process.

Do nothing in two languages

The fact that CMD reversed the order of the parts is the key. In batch files, you can make sure lines are not executed at all by starting them with a :-character. This is different from a rem-statement; the latter is executed, and echoed if echo is still on. In fact, : is a very powerful character, as you can see in the next version:

<?php/* :
@c:\php\php.exe -q %~f0 %*
@goto :EOF
*/ ?>
<?php
 print date('r', $argv[1]) . "\n";
?>

In this version, two things are changed in the first line: first of all, the PHP comment opening is placed directly after the PHP opening tag; and secondly, the :-character is added. Now here is a surprise; CMD will be smart enough to reverse the order of the parts, but after that, it will find out it has nothing to do! CMD will execute the following line:

: < ?php/*

But since lines starting with a : are completely skipped, CMD doesn’t even look at the filename anymore. By concatenating the PHP opening tag and the comment opening, we created a longer, even more illegal filename ?php/*, which is ignored completely. But in the file, the :-character appears inside the PHP comment, so PHP doesn’t care about it at all.
And this is it! If we run this batch file, we get:

Sat, 14 Feb 2009 00:31:30 +0100

And that is exactly the output of PHP we want.
Voila! Four lines of polyglot that you can just prepend to every PHP program to make it run from the command line. Three lines more than Unix, but it makes for a much more interesting read.

Appendix

Below is the file with simple syntax highlighting, first the batch version, then the PHP version. Special words and characters are purple, comments are green, and functions and variables are blue. The grey part in the batch file is mostly invalid syntax, but is never reached.
One more thing: since everything after the : in the first line is seen as a comment in both PHP and batch language, you can really write a comment there. I inserted some sort of small signature there.

Batch

<?php/* : http://whitescreen.nicolaas.net/php_polyglot.php
@c:\php\php.exe -q %~f0 %*
@goto :EOF
*/ ?>
<?php
    print date('r', $argv[1]) . "\n";
?>

PHP

<?php/* : http://whitescreen.nicolaas.net/php_polyglot.php
@c:\php\php.exe -q %~f0 %*
@goto :EOF
*/ ?>
<?php
    print date('r', $argv[1]) . "\n";
?>

1 thought on “Writing a PHP polyglot

  1. Pingback: White screen › Windows shebangs

Leave a Reply