One feature of PHP rarely seen in production code is PHP Iterators. Iterators are not unique to PHP, as Java and C++ have them, but they are a powerful mechanism to increase code usability. A very useful feature of PHP Iterators is the ability to extend them to iterate over any type of array or object. A unique implementation of PHP Iterators is to quickly and easily iterate over a result from a SQL query against a database. This provides a fast and very memory efficient implementation for loading up many objects.
Essentially, with an Iterator, you can use the foreach() loop as opposed the while() loop to load objects. Adding to their power, at each iteration, you can have the iterator load up additional data or do any side processing automatically.
Setting Up The Database Environment
You will need to set your environment up for this article to work properly. First, create a new database, `iterator_test` with two tables, `user` and `user_comment`:
CREATE DATABASE `iterator_test` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci; CREATE TABLE `user` ( `id` SMALLINT( 3 ) NOT NULL AUTO_INCREMENT PRIMARY KEY , `email_address` VARCHAR( 255 ) NOT NULL , `firstname` VARCHAR( 64 ) NOT NULL , `lastname` VARCHAR( 64 ) NOT NULL , `age` TINYINT( 1 ) NOT NULL ) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci; CREATE TABLE `user_comment` ( `id` SMALLINT( 3 ) NOT NULL AUTO_INCREMENT PRIMARY KEY, `from_id` SMALLINT( 3 ) NOT NULL , `to_id` SMALLINT( 3 ) NOT NULL , `comment` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL ) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci; INSERT INTO `user` (`id`, `email_address`, `firstname`, `lastname`, `age`) VALUES (1, 'vmc@leftnode.com', 'vic', 'cherubini', 25), (2, 'joe@google.com', 'joe', 'perry', 43); INSERT INTO `user_comment` (`id`, `from_id`, `to_id`, `comment`) VALUES (1, 1, 2, 'this is a comment'), (2, 1, 2, 'this is also a comment!'), (3, 2, 1, 'howdy, thanks for contacting me!');
The table `user_comment` will hold a list of comments `from_id` makes to `to_id`. These both point to the `id` field of the `user` table.
Classes and Models
Two classes are needed to manage User’s and User_Comment’s. They will both extend from a base class, Model, that defines some common methods and variables between them.
*/ abstract class Model { ///< The ID of the object from the datastore. protected $_id = 0; ///< The data as a key-value array of the object. protected $_model = array(); /** * The default constructor of the object. * @author vmc * @param $object_id The optional ID to load from the datastore. * @retval Object Returns a new Model object. */ public function __construct($object_id=0) { $this->_load(intval($object_id)); } /** * Destructor. * @author vmc * @retval NULL Returns NULL. */ public function __destruct() { $this->_model = array(); } /** * The magic method to get an element from the $_model array. * @author vmc * @param $name The name of the element to get. * @retval mixed Returns the value from the model, NULL if not found. */ public function __get($name) { if ( true === isset($this->_model[$name]) ) { return $this->_model[$name]; } return NULL; } /** * Sets a value in the model. * @author vmc * @param $name The name of the key to set. * @param $v The value of the key to set. * @retval bool Returns true. */ public function __set($name, $v) { $this->_model[$name] = $v; return true; } /** * Returns the ID of the object. * @author vmc * @retval int Returns the ID of the object. */ public function getId() { return $this->_id; } /** * Returns an array of elements from the model. By default, returns * all of them, however, the inclusion or exclusion of them can be defined. * @author vmc * @param $list An option array of elements to include or exclude from the result. * @param $ignore False to include the elements, true to include them. * @retval Object Returns a new Model object. */ public function getArray($list=array(), $ignore=false) { $return = $this->_model; if ( count($list) > 0 ) { if ( false === $ignore ) { $return = array_intersect_key($this->_model, asfw_make_values_keys($list)); } else { foreach ( $list as $v ) { unset($return[$v]); } } } return $return; } /** * Sets all of the model data from an array. If 'id' is a field * of the array, it is extracted out first, and set as the $_id. * @author vmc * @param $obj The object to load from. * @retval Object Returns $this for chaining. */ public function loadFromArray($obj) { $this->_id = 0; if ( true === isset($obj['id']) ) { $this->_id = $obj['id']; unset($obj['id']); } $this->_model = (array)$obj; return $this; } /** * Writes all of the data to the datastore. If the object is already loaded, * ie, the ID is greater than 0, the data is updated, otherwise, it is inserted. * @author vmc * @retval Object Returns a new Model object. */ public function write() { if ( $this->_id > 0 ) { $this->_update(); } else { $this->_insert(); } return $this->_id; } /** * Loads all of the data from the datastore for this model object. * @author vmc * @param $object_id The ID of the model to load. * @retval int Returns the ID of the loaded model. */ protected function _load($object_id) { $table_name = strtolower(get_class($this)); $object_id = intval($object_id); if ( $object_id > 0 ) { $result_model = LN::getDb()->select() ->from($table_name) ->where('id = ?', $object_id) ->query(); if ( 1 == $result_model->numRows() ) { $model = $result_model->fetch(); unset($model['id']); $this->_model = $model; $this->_id = $object_id; } } return $this->_id; } /** * Inserts the new model data into the datastore. * @author vmc * @retval int Returns the ID of the new model. */ protected function _insert() { LN::getDb()->insert() ->into($this->_table) ->values($this->_model) ->query(); return $this->_id; } /** * Updates the model data into the datastore. * @author vmc * @retval int Returns the ID of the new model. */ protected function _update() { LN::getDb()->update() ->table($this->_table) ->set($this->_model) ->where('id = ?', $this->_id) ->query(); return $this->_id; } }
Fortunately, the Model object takes care of most of the methods and variables for User and User_Comment. The User class needs a simple way to get a list of comments quickly, in which the getCommentList() works well. The optional argument to the method allows the programmer to override the Lazy Loading method and to force it to always load from the database.
*/ class User extends Model { ///< The Db_Iterator object of comments. private $_commentList = NULL; /** * Returns a list of comments that the user has made. * @author vmc * @param $force Optional variable to force loading of the comment list even if not null. * @return Object Returns Db_Iterator object to cycle through the results. */ public function getCommentList($force=false) { if ( NULL == $this->_commentList || true === $force ) { $result_comment = LN::getDb()->select() ->from('user_comment') ->where('from_id = ?', $this->_id) ->query(); $this->_commentList = new Db_Iterator($result_comment, new User_Comment()); /** * Calling $result_comment->free() here will cause the application to break * because the variables are copy-on-write. Because nothing was written, * the $result_comment variable in the iterator will be freed, and no results * can be fetched. * * $result_comment->free(); * * However, if you had an unset($result_comment), it will only unset this * locally scoped variable and not the variable in the iterator because the * variable in the iterator will be considered written to, and thus, copied * to a new memory location. You'll see the memory go up slightly if you * add the unset() because of the copy-on-write. * * unset($result_comment); * */ } $this->_commentList->rewind(); return $this->_commentList; } }
*/ class User_Comment extends Model { /** * Allows the object to be printed directly. * @author vmc * @return string Returns a string representation of the object. */ public function __toString() { $str = 'User #' . $this->_model['from_id'] . ' made the comment, "'; $str .= $this->_model['comment'] . '", to User #' . $this->_model['to_id'] . '.'; return $str; } }
Because User_Comment has no dependencies, it remains a nearly empty class. A single __toString() method was added to allow easy object display. User, however, needs to load up a list of User_Comment objects. It is at this point where this implementation becomes very memory efficient. Over the course of a User’s existence on a website, they could create hundreds or thousands of comments. Loading all of these up at a single time as an array of objects is very inefficient. The initial loop must iterate through each result to build the array. This array would be very large and take up unnecessary memory. A much better implementation is to store a pointer to the initial result set and only loop through the comments when necessary.
Introducing the Db_Iterator Object
The getCommentList() method in the User class returns a Db_Iterator instance that holds a result set to the list of comments created by that User. getCommentList() uses Lazy Loading, a method to only load up the comments when necessary (i.e., the first time getCommentList() is called, rather than on object construction). Looking at the Db_Iterator object (with kind thanks to the original implementer, Trevor Andreas), one can see that it stores the result object as a private member variable.
* @author vmc */ class Db_Iterator implements Iterator { ///< The result set from the database. private $_result = NULL; ///< The object/model that is built on each iteration. private $_object = NULL; ///< The current row. private $_key = 0; /** * The default constructor to build a new Iterator. * @author tandreas * @param $result The result object. * @param $object An empty object. * @retval Object Returns a new Db_Iterator object. */ public function __construct(Artisan_Db_Result $result, $object) { $this->_result = $result; $this->_object = $object; $this->_key = 0; } /** * Destructs the iterator and frees the result set if its not null. * @author vmc * @retval NULL Returns NULL. */ public function __destruct() { $this->_object = NULL; if ( NULL !== $this->_result ) { $this->_result->free(); $this->_result = NULL; } } /** * Destructs the iterator and frees the result set if its not null. * @author vmc * @param $data Sets the data of the object through loadFromArray(). * @retval Object Returns the object for easy write access. */ public function set($data) { if ( true === is_array($data) ) { $this->_object->loadFromArray($data); } return $this->_object; } /** * Returns the current element. * @author tandreas * @retval Object Returns the current element of the iteration list. */ public function current() { return $this->_load($this->_key); } /** * Returns the last row. * @author tandreas * @retval Object Returns the last row. */ public function last() { $num_rows = $this->_result->numRows()-1; return $this->_load($num_rows); } /** * Returns the key of the current element. * @author tandreas * @retval int Returns the integer key of the current element. */ public function key() { return $this->_key; } /** * Moves to the next element. * @author tandreas * @retval int Returns the next key's value. */ public function next() { $this->_key++; return $this->_key; } /** * Rewinds to the first row of the result. * @author tandreas * @retval boolean Returns true. */ public function rewind() { $this->_key = 0; $this->_result->row(0); return true; } /** * Determines if the next() or current() calls are valid. * @author tandreas * @author vmc * @retval bool Returns true if they are valid, false otherwise. */ public function valid() { return ( $this->_key != $this->_result->numRows() ); } /** * Loads up the specified object during iteration. * @author vmc * @param $i The key/index to load from. * @retval Object Returns the built object. */ private function _load($i) { $this->_result->row($i); $row = $this->_result->fetch(); if ( true === is_array($row) ) { $this->_object->loadFromArray($row); } return $this->_object; } }
When initially building this, the largest problem faced was the desire to not have each iteration require a new query to load up that object. To us, that defeated the entire reason for the class as it required a database call on each iteration when a single one could be made instead. The decision to add the set() method, and corresponding loadFromArray() method was made. On each iteration, loadFromArray() is called in _load() to set the data from the row to the object. In turn, loadFromArray() identifies the primary key of the object, id, and the rest of the data as the model. This is a very fast way to load up an object without the use of another query.
In addition to loadFromArray(), set() can be used to add new elements to the list. Because set() returns an instance of the object originally passed into the Db_Iterator instance, one can immediately call write() on the returned object to write it to the datastore. Furthermore, if the $comment_data array has a valid id entry, the object will be updated rather than inserted. This creates a very easy interface for adding and updating new objects from a single point of entry.
Implementation
Testing this implementation is simple and intuitive.
'localhost', 'username' => 'username', 'password' => 'password', 'database' => 'iterator_test', 'debug' => false ) ); LN::init($config_db); /** * Create a new User object, and load up all of the comments * user ID #1 has made. */ $user = new User(1); $comment_list = $user->getCommentList(); /** * Loop through all of the comments. $comment is an object of type * User_Comment. */ foreach ( $comment_list as $comment ) { echo $comment . ''; } /** * Insert a new comment. $comment_data could easily come from a form submission. * If there was a field name 'id' in the array below, and it had a valid ID, * the comment would be updated rather than inserted when write() is called. */ echo ''; echo 'Adding a new comment....'; $comment_data = array( 'from_id' => 1, 'to_id' => 2, 'comment' => 'thanks for the reply, @joeperry!' ); $comment_list = $user->getCommentList()->set($comment_data)->write(); /** * Force a reload of the comments. true must be passed to the method. */ echo ''; echo 'Reloading the comments.... '; $comment_list = $user->getCommentList(true); foreach ( $comment_list as $comment ) { echo $comment . ''; } echo ' <hr />'; echo round((memory_get_peak_usage()/(1024*1024)), 4) . 'MB'; LN::cleanup();
Final Questions
Using this overall solution begs a question: what happens on the datastore’s end when holding a result set in memory during the page execution. Fortunately, not much. The Db_Iterator class handles the result set variable. PHP uses copy-on-write for its variable’s value, and because the result variable is never written to, the variable that’s passed into the Db_Iterator instance is destroyed by the iterator’s destructor.
Conclusion
The initial implementation is very trivial. Further applications exist for the iterator, such as filtering results and easy pagination. Using these methods, developers can quickly and easily manage their objects from a central object. This technique was used extensively in the Prospect Vista project. Similar to this article, a single User object controls many different iterators of children objects the User object had created. For example, one User type can create: Comments, Contacts, Fans, Payments, Statistics, Statuses, and Videos. With this code, loading up a list of Contacts is as simple as: $user->getContactList().
Iterators are a very powerful and surprisingly simple way to reconfigure your application. One can cut down on the amount of SQL queries executed and the total memory used, and clean up the design of their classes with little effort. The code in this article was adopted from Artisan System, Leftnode’s PHP5 framework.