Magento Quickies

All the Non-Trivial Magento Trivia

0 notes

N98-magerun: Understanding PHAR Files

Throughout this series we’ve skirted around one important issue. Namely, the opaque nature of PHP’s phar format.

Many groups claim the term “open source” in someway or another. There’s open source the license, which dictates how and where code can be used and shared. Then there’s open source “you can you see and read the literal source code”, which may or may not have an open-source license. Many (of the better) commercial Magento extensions fall into the later category: They’re distributed as plain text files, but the extension vendor still asserts copyright over the files.

While the n98-magerun is open source both in their licensing and their literal source code, PHP’s phar format muddies the waters a bit. A phar is, by default, not readable source code.

Fortunately, it’s relatively easy to peek inside any phar and see the source files it was created from, which is what we’ll show you in today’s article.

What is a phar?

A phar is a PHP Archive file. If you’re familiar with the java programming language, phar files are (were?) an attempt to bring a java’s jar concept into PHP. Or as the official documentation puts it.

What is phar? Phar archives are best characterized as a convenient way to group several files into a single file. As such, a phar archive provides a way to distribute a complete PHP application in a single file and run it from that file without the need to extract it to disk. Additionally, phar archives can be executed by PHP as easily as any other file, both on the commandline and from a web server. Phar is kind of like a thumb drive for PHP applications.

That’s all nice conceptually — but what does it mean? If you open up the n98-magerun.phar file with text editor, you’ll see something like this

#!/usr/bin/env php
<?php

Phar::mapPhar('n98-magerun.phar');

$application = require_once 'phar://n98-magerun.phar/src/bootstrap.php';
$application->setPharMode(true);
$application->run();

__HALT_COMPILER(); ?>

[gobs of binary-ish looking data]

So, there’s some standard PHP near the top, but then it’s a mumble of mixed binary and ASCII data all the way down. What are we looking at?

Internally, phar archives can collect their files in the zip format, or the tar format. In addition to that the entire archive, as well as individual files in the archive, may be compressed in the gzip or bzip2 format. That’s what all the partially muddled lines are — an archive of every file in the phar.

So what about the stuff near the top?

#!/usr/bin/env php
<?php

Phar::mapPhar('n98-magerun.phar');

$application = require_once 'phar://n98-magerun.phar/src/bootstrap.php';
$application->setPharMode(true);
$application->run();

__HALT_COMPILER();

This is the archive’s stub file. Unlike java and jars, PHP has no history or concept of a class having a main function to run. If you want your phar to be an executable program (as opposed to a portable library), you need to use a stub file.

The phar stub file is a small piece of PHP code used to initialize your application. The n98-magerun stub file can be found in _cli_stub.php.

#File: _cli_stub.php
#!/usr/bin/env php
<?php

Phar::mapPhar('n98-magerun.phar');

$application = require_once 'phar://n98-magerun.phar/src/bootstrap.php';
$application->setPharMode(true);
$application->run();

__HALT_COMPILER();    

As you can see, it’s identical to what ends up in the n98-magerun.phar file. The mechanics of phar creation are beyond the scope of this article, but if you wanted to get started with your own archives take a look at the n98-magerun phing build.xml file, particularly the pharpackage task.

Opening up a phar

Having looked inside the n98-magenrun.phar file, we now know that a phar is a tiny bit of PHP code (the stub), combined with a messy binary-ish blob. Depending on your particular security needs, trusting the code that’s in that binary-ish blob may or may not fly in your organization.

Fortunately, this isn’t unreadable executable machine code. As we mention earlier, these files are zip or tar archives. As with those archive formats, the files from a phar can be easily extracted. The code to do so is built right into PHP. Just create a simple CLI script with the following content

#File: unphar.php
$phar = new Phar('/path/to/n98-magerun.phar');
$phar->extractTo('/path/to/extract');

This instantiates a Phar object (PHP’s internal representation of a phar archive), and then uses its extractTo method to extract the files to a specific folder. If you run the above script, and then take a look at the folder you extracted to

$ ls -l /path/to/extract
-rw-rw-rw-  1 username  staff  1085 Jun 11 14:27 MIT-LICENSE.txt
-rw-rw-rw-  1 username  staff  3807 Jun 11 14:27 config.yaml
drwxr-xr-x  6 username  staff   204 Jun 11 14:27 res
drwxr-xr-x  4 username  staff   136 Jun 11 14:27 src
drwxr-xr-x  9 username  staff   306 Jun 11 14:27 vendor

you’ll see a directory structure that contains all the files that were included in the phar.

Decompressing

There’s one last step you may need to take before you may start examining the contents of your phar archive. If you use cat to view the contents of a file, you’ll be in for a surprise.

$ cat /path/to/extract/config.yaml
????P?Ј ZL7?{?R???I?2f?>??<?y?-?3d?h?߂CT.?A~??+?T4v?ѐ44h?9?d?a0L!?# ?h?
...

For some reason, when PHP unarchives n98-magerun.phar, it fails to decompress the files. You can confirm this with OS X’s file command, which identifies files of unknown types.

$ cd /path/to/extract/
$ file config.yaml 
config.yaml: bzip2 compressed data, block size = 400k

I’ve done outsourced some detective work on this, but haven’t been able to get to the bottom of it. The Phar class has a decompressFiles method. My assumption is this will decompress the files in the archive. However, this tripped up with the following error

PHP Fatal error:  Uncaught exception 'BadMethodCallException' with message
'unable to write contents of file    
"vendor/fzaninotto/faker/src/Faker/Provider/sr_Latn_RS/Person.php" to new phar 

If anyone’s successfully un-phared and decompressed these files strictly via PHP, please let me know. In the meantime, you’ll want to decompress each individual file manually before performing your code review. I did it by writing this quick PHP command line script

#File: unbzphar-file.php
&lt;?php
namespace Pulsestorm\Cli\Unbzphar;
function main($argv)
{
    $script = array_shift($argv);

    echo "starting loop";    
    foreach($argv as $file)
    {
        echo "decompressing file $file with bunzip2 --stdout";
        $contents = `bunzip2 --stdout $file`;
        file_put_contents($file, $contents);
    }
    echo "ending loop";
}
main($argv);

and then using find and xargs to run it on every file in the decompressed archive folder

$ find /path/to/extract -type f | xargs php /path/to/unbzphar-file.php 

The unbzphar-file.php script runs each passed in filename through bunzip2 --stdout (which decompresses it), and then immediately rewrites the file with the returned content.

The find /path/to/extract -type f command finds everything in a directory that’s a file (as opposed to a sub-directory).

Then, we pipe the output for this command through xargs. The xargs program takes an (almost) unlimited number of lines from standard output, and then uses them as individual arguments to a specific unix command. In out case, that specific unix command is

php /path/to/unbzphar-file.php 

and viola! All our files are decompressed.

You’re now ready to fully examine any phar’s contents, and ensure you know exactly what you’re getting into.

Filed under magento n98magerun