Tag Archives: Unix

Windows shebangs

Windows shebangs

In Writing a PHP polyglot, I wrote about the Unix ‘shebang’ mechanism. Any script, being shell script, Perl, PHP or any other language, will be started by the shell; and any script can have a very special line in the beginning, that looks e.g. like this:

#! /usr/bin/perl -w

The #! means so much as: ‘use the following command to run this script’. And then the name of the program to use follows. Now the shell knows what to do with this program, and does that instead of trying to interpret the file itself.

Indeed, the rest of the article was about an amusing way to get something like that to work; read it if you want to know weird stuff about batch language.

A more general approach

Although writing a polyglot is fun, it hardly counts as a good solution. There is a much easier solution in Windows, via so-called ‘file type associations’. It can be done from your command line, and goes something like this: first, you tell Windows that a file ending in .pl is a Perl file:

c:\> assoc .pl=Perl

And then you explain what Windows should do with a file of that new type:

c:\> ftype Perl="C:\Perl\bin\perl.exe" "%1" %*

And now you run your Perl script:

c:\> pi.pl
3.14159265358979

There you go. Windows will remember from now on what to do with Perl files, and you don’t need any special code. Exactly the same thing could be done for PHP files, or for any other file type your system supports.

(What strikes me as odd is that assoc is used to tell the system what file type a file with a certain extension is, and ftype then associates this file type to a command. But that’s probably just me.)

This might actually be better than a shebang

There is an interesting advantage to this mechanism above the polyglot way, or even the shebang concept: these are systems settings. That means that on a correctly configured system, it will always work. In the PHP polyglot, the full path name to PHP was specified, because it is not on your path by default. In a shebang, it is actually required that a full path is used. But what if your system doesn’t have PHP in /usr/bin/php, but in /home/dnicolaas/myownphp/php? Then the script doesn’t work and needs editing. Of course it could work if you could specify a command without a full path, but changing all version of Unix is outside the scope of this article.

A clever workaround?

The problem was recognised by Unix experts, and in some scripts you will see this instead:

#! /usr/bin/env perl -w

Here, the full path to perl isn’t specifed. So what is /usr/bin/env? It is not a program that knows how to interpret a language specified in its second parameter;  it just runs all its arguments as a command in a copy of the environment. In other words, it runs

perl -w

in a new shell. And since this new environment will just look up perl in the path, this gets around the requirement that perl gets specified with a full path. It exchanges it for the requirement that /usr/bin/env exists, of course, but that usually does. Except that on some systems it doesn’t, so this workaround is not perfect.

… or maybe not?

There are two things that are not so nice about the file type associations: they require your to give your scripts a unique extension, and they even require you to type that extension when you want to run the program. When I want to know the value of Ï€, I don’t need to know which programming language I used; I’d rather type

pi

than

pi.pl

Another disadvantage is that if I get a script from a Unix user (or even a whole collection of scripts,) I have to rename them before I can use them. The first problem can be solved in Windows by setting the PATHEXT variable:

set PATHEXT=%PATHEXT%;.pl

This turns .pl into a so-called executable extension; Windows now knows that files with the .pl extension can be typed on the command line without the extension, much like files ending in .com, .exe and .bat.

The second problem cannot be solved so easily. Renaming files acquired from somewhere else is never nice: it is work, it requires thinking when upgrading to a new version, and it might break installers. Is there anything we can do about that?

Running extensionless scripts

So let’s assume a Unix programmer gave us a little Perl script that reads like this:

#! /usr/bin/perl
use Math::Trig;

print "The trip around a planet with radius $ARGV[0] is ";
print pi * 2 * $ARGV[0];
print "\n";

It’s called circ, and you can invoke it as follows:

$ circ 6371
The trip around a planet with radius 6371 is 40030.1735920411

At least, that is what your Unix friend said. But all you get is this:

c:\> circ 6371
'circ' is not recognized as an internal or external command,
operable program or batch file.

But couldn’t we use ftype and assoc to associate the empty extension?

c:\> assoc .=Perl
.=Perl

c:\> ftype Perl="C:\Perl\bin\perl.exe" "%1" %*
Perl="C:\Perl\bin\perl.exe" "%1" %*

c:\> circ 6371
'circ' is not recognized as an internal or external command,
operable program or batch file.

Why didn’t this work? Because we didn’t specify the empty extension as an executable extension. Windows faithfully adds all executable extensions to ‘circ’ to make it run, but it doesn’t add the empty extension if you don’t tell it to:

c:\> set PATHEXT=%PATHEXT%;.
c:\> circ 6371
The trip around a planet with radius 6371 is 40030.1735920411

So there we are! Now we can run any Perl script that we get from our Unix friends!

Another extensionless script

So let’s assume a Unix programmer gave us a little PHP script that reads like this:

#! /usr/bin/php
<?php
print "The volume of a planet with radius $argv[1] is ";
print pi() * 4 * pow($argv[1],3) / 3;
print "\n";
?>

And it’s called volu, and meant to be run like this:

$ volu 6371
The volume of a planet with radius 6371 is 1.08320691685E+012

Ouch. That won’t work, will it? Since we associated the empty extension with Perl, Windows will insist on running Perl on this script. And it isn’t Perl.

Perl tries, I must admit. But Perl doesn’t know we’re running on Windows, so Perl tries to run the program specified in the shebang line:

c:\> volu 6371
Can't exec /usr/bin/php at volu line 1.

But hey! That may be the solution. What if we associated the empty extension with a program that reads shebang lines, and runs the Windows equivalent of the command specified in the shebang line?

A Windows shebang

So let’s try this:

c:\> assoc .=shebangfile
.=shebangfile

c:\> ftype shebangfile=shebang.bat "%1" %*
shebangfile=shebang.bat "%1" %*

c:\> set PATHEXT=%PATHEXT%;.

Now we have arranged that every extensionless file will be run with shebang.bat, so this:

c:\> volu 6371

will turn into this:

shebang.bat c:\volu 6371

Now the only thing we need to do is create shebang.bat, and put it in your path (your c:\WINDOWS directory is fine.) What shebang.bat does is exactly the same as what the shell on Unix does: it reads the first line of the file, extracts the command from it, and runs it with the whole command line as arguments. Since we don’t want to know where on Windows Perl or PHP are installed, and we don’t mind being a little better than the original shebang solution, shebang.bat just extract the last bit (the command name) from the path and runs that, effectively assuming it’s on your path. So the above command translates to:

php c:\volu 6371

which works like a charm.

Here’s shebang.bat:

@echo off
rem shebang.bat - Unix shell behaviour from windows.
rem Use
rem     assoc .=shebangfile
rem     ftype shebangfile=shebang.bat "%1" %*
rem     set pathext=%pathext%;.
rem and put this somewhere in your path.
rem Author: Dion Nicolaas <dion@nicolaas.net>
rem http://whitescreen.nicolaas.net/programming/windows-shebangs
rem 
rem Get the first line of the file
set /p line=<%1
rem Remove all quotes from the string, they shouldn't be there anyway
set line=%line:"=%
rem turn each part of the path into a quoted string, separated by spaces
set line="%line:/=" "%"
rem set first to the first part ("#!"), last to the last part (e.g. perl -w)
for %%i in (%line%) do call :firstlast %%i
rem if it was a shebang line, set command accordingly, else use "type"
set command=type
if "%first%"=="#!" set command=%last%
if "%first%"=="#! " set command=%last%
rem Run command on the command line
%command% %*
goto :EOF
:firstlast
rem Get first and last token, unquote in the process (which will also strip 
rem spaces). In Unix scripts 'perl -w' is much more likely than "the language
rem processor" (long file name with spaces)
if "%first%"=="" set first=%~1
set last=%~1

There’s nothing very special about this batch file. It’s just that batch language is not very good at text processing.

There is one special case that wasn’t mentioned before: What if a file doesn’t have a shebang line? On Unix, a script without a shebang line is assumed to be a shell script, but only if it’s permissions show it is an executable file. On Windows that would be a bit dangerous, because we just designated every extensionless file executable: that includes README and TODO and other files that are probably just text. So if a file doesn’t start with ‘#!’, we just run it through ‘type’, which effectively displays it on the screen.

Epilogue

One more thing: changing PATHEXT from your command line is temporarily. If you want this to work permanently, you need to edit your environment differently: Click ‘Start’ / ‘Control Panel’ / Double-click ‘System’ / ‘Advanced’ tab / button ‘Environment Variables’. Then find ‘PATHEXT’ in the ‘System Variables’, ‘Edit’ it, click a few OKs and open a new CMD window. Your old, already open windows will NOT magically get the new version of PATHEXT, but all your new ones will.

This version of shebang.bat only support shebang lines like this:

#!/usr/local/bin/php
#! /usr/local/bin/perl -w

When run with a file that starts with

#! /usr/bin/env perl

it will fail:

'env' is not recognized as an internal or external command,
operable program or batch file.

But env.bat is very easy to implement on Windows. Just store it somewhere on your path as well:

@%*

This looks like cursing, but it actually means: run %* (all parameters to this batch), but don’t echo it (the @ suppresses output.) This might not catch all subtleties of the Unix ‘env’, but for our purpose it will do just fine.