PHP Hacking
I don't feel like it to change my patches with every new release of PHP. In order to give something back to the community and to support my laziness, I publish a few patches for the PHP language. Some are really useful, some are quite okay and others are only to help in a more particular case. But read over the new featureset to make you an own opinion.
New syntax
Integer division
In visual basic is a backslash the operator for a integer division. I find this really useful because you don't have to cast anything. In a previous patch for the 5.2 tree I used also the backslash. But in the provided patch for 5.3, I use a double backslash for this task because the single one is used for namespaces. I also added a short equal form of this.
<?php $x = 5 \\ 2; // 2 $x\= 2; // 1 ?>
Short array
The alert reader should already seen this syntax in some of my PHP snippets. Programmers are such a lazy folks and such a short form for arrays are really good. I started by using { and } as array terminator like C/C++, but moved to [] to keep the possebility of handling objects or even JSON later.
$arr = [1, 2, [5 => "foo", 3.14159], 9];
Binary numbers
And again a syntax theft, this time from C#. There you can define numbers in a similar way like hexadecimal numbers: 0x90. With this patch you can define binary numbers with a 0b prefix like this: 0b01001. I don't know, if this feature is good for a common use, because you know there are after all only 10 persons who understand binary. But I use bit sets very often and this is a good and fast way to do this.
Negative string offsets
What happens if you write $str[-5]? Right, you get a warning and this expression returns null. But why should we give it away? We could use negative string offsets in the same way as positive string offsets with the difference, that we start at the end of the string. So [0] is the first, [1] the second and so on, and [-1] is the last, [-2] the second last and so on, character of the string. This is really intuitive and avoid nasty strlen() baubles.
Isset without null check
I ever thought, that the internal isset() construct is only good for checking for existence of something (as the name suggests). I should had a look at the manual; RTFM I know. But this enlightenment came after the built of my application and with the guess, that isset() works like array_key_exists() on arrays. So I removed this extra check for NULL values.
MySQLi typed
Facebook is a great thing and have cool employees. One of them is was Brian Shire, which wrote a little patch for PHP to offer native type casting to appropriate types. But for what is this good for? It is much faster than strings. For example you want to compare $row['ID'] with 15. I already wrote about working on numbers in PHP, and there is a big difference. Sure one may say, that most data will only wrote to the output buffer and is never used as number. When I look back on my previous developments, I mostly use arithmetic operation on numbers returned from the database. There are also a few identifier which will passed as is but even more numbers to calculate. So I decided to use Brian's approach for every function. I wrote a little different patch, and use it for MySQLi only. If you think, that you can ignore the overhead when you compare a number with a string, ever thought about the storage size? Applications go bigger and bigger and if you want to cache things, every byte you can save is gold, especially if you use a binary serializer like igbinary.
MySQLi and mysqlnd
PHP 5.3 have a lot of new features, like namespaces and such other ugly stuff. But really useful is the way of don't use libmysql anymore (yes, it's still possible, but hopefully not for long). I wrote a patch for mysqlnd and also MySQLi (I do not support PDO and MySQL at this time). This patch offers a few better default values, two new functions and closes a little bug. But let's start at the beginning.
There are two new functions mysqli_return() and mysqli_matched_rows(). The first is a little like mysql_result(), but also frees the result (if you want it). Normally you write such snippets:
<?php $row = $res->fetch_assoc(); $res->free(); return $row['foo']; ?>
But what, if you want to return only a scalar from a function? You would write the result in a variable, free the result and return the array element like the example above. With this new function it is possible to write this:
<?php return mysqli_return($res); ?>
...and everything is done for you. There is also an OOP API. The other function I added is mysqli_matched_rows() which returns the same as mysqli_affected_rows() on selects but on any DML operation, you get the number of matched rows - not the changed rows. This saves a count, if you need the number of results from the table you updated for example.
The mentoined default changes are, that mysqli_fetch_all() returns an associative array by default and that the native type casting mentoined in "MySQLi typed" is turned on after the connect to the server.
As you may know, in PHP 5.3 is the native type casting I talked about already implemented. I hope, that MySQL implement the binary protocol patch of Andrey soon. But in mysqlnd is a little bug which returns some integers as string. The patch closes this.
Time call optimization
While I strace(1)'d PHP I saw a lot of time(0) calls. For every time() in PHP, there is also a time(0) call to the kernel - and even more for internal handlers. I thought, that this is optimized away by using the SAPI layer. But this SAPI method for use a cached time(0) is very spartan implemented. So I removed all time(NULL) and time(0) calls from the source and also implemented such a sapi method for CGI/FCGI. You may know, that there is no chance of getting the time via CGI/FCGI. But I patched also lighttpd and nginx to send the time as RAW_TIME. I started using a 4byte byte string for this, but for a public use, I pass it as a normal string number.
PHP 5.3 Patch
lighttpd Patch
nginx Patch
str_split with sequence handling
str_split() is a good choice if you want to split a string after a certain number of bytes and return it as an array. But what if you need the first 2, 6, 10 and the last 2 characters. You would start hacking dozens of strlen() + substr() calls into one another or you would start using regexes. But it's much easier with my little patch. Normally str_split() have the optional parameter for the length. With the patch the second optional parameter becomes a mixed variable and you can pass integers as it was before and also pass arrays, with a sequence. The only thing you have to regard is, that the sequence has to grow from left to right like the earlier example. The sequence can start with 0 or a greater value and also negative integers are allowed. Internally a negative value is summarized with the total string length, so that you can address a offset from the end of the string. To make things clearer here is a little code snippet for a better understanding:
<?php
var_dump(str_split("0123456789abcdefwant to extract this?abcd", [16, -4]));
/*
array(3) {
[0]=>
string(16) "0123456789abcdef"
[1]=>
string(21) "want to extract this?"
[2]=>
string(4) "abcd"
}
*/
var_dump(str_split("hello how are you?", [0, 1, 6, 7, 9, -2, -1]));
/*
array(7) {
[0]=>
string(1) "h"
[1]=>
string(5) "ello "
[2]=>
string(1) "h"
[3]=>
string(2) "ow"
[4]=>
string(7) " are yo"
[5]=>
string(1) "u"
[6]=>
string(1) "?"
}
/*
?>
Microtime default parameter
A very useless default parameter is the one of microtime(). I can remember, with PHP 4 everyone used a explode + subtract solution around microtime(). With PHP 5 it became possible to return the time as double, but this is not the default. With this little change the default value of microtime() is return as double.
Session private cache header
This patch is also very simple by adding a private flag to the Cache-Control header to explicitly say, that the content is really not saved on any proxy or something else. This is only a paranoid precaution.
Disable include warnings
A really annoying problem is, that include warnings spam the logfile. You could add a @ in front of the include command, but this forces PHP to be silence for every warning/error of the included file. This patch adds the new ini directive ignore_include_warning to be able to disable include warnings with ini_set() or globally.
UTF-8 and ENT_QUOTES as default
As most web applications work with UTF-8, it is a good idea to bring UTF-8 as default into the game. The same is true of ENT_QUOTES. Okay, I must admit, that this patch is also a little product of laziness because I hate writing ENT_QUOTES, "UTF-8" - thus this is the last time.
Right date format for RSS constant
This patch and the corresponding text is of the sort "long text, really short sense", but it is too minor to create an own article for this.
The start of this was, that wondered about a different implementation of some date HTTP headers like Last-Modified and Date. In PHP is the format implemented with a numeric timezone indicator like this:
#define DATE_FORMAT_RFC1123 "D, d M Y H:i:s O"
...which produces the following output: Sun, 25 Oct 2009 18:48:26 +0100
But most webservers send the time values as Sun, 25 Oct 2009 18:48:26 ECT
What is correct? Should I rewrite my whole codebase? Should I hack the webserver? No! After reading the RFC's 822, 850 and 1123 I found the answer:
There is a strong trend towards the use of numeric timezone
indicators, and implementations SHOULD use numeric timezones
instead of timezone names. However, all implementations MUST
accept either notation.
This is a good information, but to share patches here. And I really want to share the most important patch of this collection. This patch fixes the constant for the RSS constant DATE_RSS. As the implementation in PHP most of these RFC-dates use the numeric timezone indicators it is really useless at the moment, but maybe at any time in the future you will be thankful. Belied? Sorry ;-) As I said the outcome of this patch is not really big, but I wanted to share this.
End
Okay, let's end here. I'd be glad if some of these patches would get part of the PHP language. There are a few others but I want to write once again.
This article could be a little longer because I wanted to publish my ifset() patch here, but found ifsetor() which is the same with a more nasty name. So I hope, that the rejection of this language construct is ignored and will be implemented in 5.3+ like the goto construct (by the way, I'm pro goto).
Another idea is making strlen() a language construct. In many so-called PHP-optimization tipps are hints about avoid using strlen() because the function calls in PHP are really lame. Returning the length of the underlying zval struct direcltly in the engine would boost the performance of this "function". What do you mean?
PHP 5.2 All in one Patch
PHP 5.3 All in one Patch
Oh...does anyone know a better way of creating such patches? I wrote this little helper script for that task, but I'm not sure, if this is the best way.