Introducitng utilities for Iterator::Simple
I recently uploaded my first module to CPAN. Iterator::Simple::Util implements most of the functions from List::Util and List::MoreUtils for the Iterator::Simple framework.
Iterators provide a simple interface for traversing (iterating over) collections. This interface typically consists of two methods, one that tests whether or not the iterator is exhausted, and another that returns the next element. The main power of iterators is the abstraction they provide over a collection. If you write code that takes an iterator as input, it will work whether the data is an in-memory array, records retrieved lazily from a database, or lines parsed from a file. For an extensive introduction no the subject, check out chapter 4 of Higher Order Perl by Mark Jason Dominus (available free for download).
There are two main implementations of iterators on CPAN.
Iterator<
aims to be the definitive implementation; it uses exceptions to signal
that the iterator is exhausted, which allows it to work with collections
containing undef
values.
Iterator::Simple
is–as the name suggests–a simpler implementation; it signals that an
iterator is exhausted by returning undef
, which of course means you
cannot use it to iterate over collections that might contain undef
as
a value. I prefer the
Iterator::Simple
interface, so tend to use this module when I know the data I'm working
with does not contain undefined values.
Iterator::Simple
implements a number of utility functions for working with iterators:
filter
, flatten
, chain
, zip
, enumerate
, slice
, head
and
skip
(corresponding functions for
Iterator<
can be found in the
Iterator::Util
module), but neither module provides the wealth of functions you will
find for working with lists in the
List::Util
and
List::MoreUtils
modules. Enter
Iterator::Simple::Util.
This module implements all of the familiar list utilities: ireduce
,
isum
, imax
, imin
, imax_by
, imin_by
, imaxstr
, iminstr
,
imaxstr_by
, iminstr_by
, iany
, inone
, inotall
, ifirstval
,
ilastval
, ibefore
, iafter
, ibefore_incl
, iafter_incl
, and
inatatime
.
Examples #
Here are some simple examples to get you started. Suppose we are working with the following data. This is a small dataset, but iterators allow us to work with bigger datasets than will fit in memory–if the data were read from a file or database, most iterator functions would load only one record into memory at a time.
my @data = (
{ region => 1, household => 1, salary => 10000 },
{ region => 1, household => 2, salary => 10000 },
{ region => 1, household => 3, salary => 12000 },
{ region => 2, household => 4, salary => 10000 },
{ region => 2, household => 5, salary => 12000 },
{ region => 3, household => 6, salary => 15000 },
{ region => 3, household => 7, salary => 12000 },
{ region => 4, household => 8, salary => 12000 }
);
We construct an iterator like so:
use Iterator::Simple qw( iter );
my $it = iter \@data;
Use imap
to extract the salary field; imap
returns an iterator that
we can pass as an argument to a utility function, in this case imax
:
use Iterator::Simple qw( iter imap );
use Iterator::Simple::Util qw( imax );
my $max_salary = imax imap { $_->{salary} } iter \@data;
# 15000
Sometimes, we want to extract the entire record with the maximum salary.
That's where imax_by
comes into play:
use Iterator::Simple qw( iter );
use Iterator::Simple::Util qw( imax_by );
imax_by { $_->{salary} } iter \@data;
# { household => 6, region => 3, salary => 15000 }
The igroup
function is my attempt to implement a common pattern of
processing subgroups of a sorted dataset. For example, suppose you want
to extract the record with the maximum salary in each region. Our
dataset is already sorted by region, so we just need to tell igroup
to
group by region. igroup
returns an iterator; each element returned by
the iterator is in turn an iterator that will return all records in the
matching group.
use Iterator::Simple qw( iter );
use Iterator::Simple::Util qw( igroup imax_by );
my $by_region = igroup { $a->{region} == $b->{region} } iter \@data;
my @region_max;
while( my $it = $by_region->next ) {
push @region_max, imax_by { $_->{salary} } $it;
}
# @region_max contains 4 elements (one for each region):
# { region => 1, household => 3, salary => 12000 },
# { region => 2, household => 5, salary => 12000 },
# { region => 3, household => 6, salary => 15000 },
# { region => 4, household => 8, salary => 12000 }