This post follows in part from a previous post, which gave an introduction some of the basic Rust setup. This post cover iterators, structs, functions, and the Option type. This can be done without reading that.
In the post on reading bytes we created a program which print out a single byte as an unsigned 8-bit integer. The completed code from that tutorial is shown below:
use std::fs::File;
use std::io::Read;
fn main() {
let mut file = File::open("dummy.txt").expect("No such file found: ");
let mut byte = [0u8];
file.read(&mut byte).unwrap();
println!("{:?}", byte);
}
What we will do in this post is convert this into an iterator, which will allow us to read in a file one byte at a time. The motivation for this is that sometimes we only need the first part of a file, for example, when we’re reading in file header information. Or maybe we’re working a constrained environment, like with embedded systems, and have very little memory. Reading in the whole file is not an option if we’re searching for just one chunk of it. In cases like these we can use an iterator to read only the portion we need.
####Getting more
After reading the introduction you may have already thought of a few ways to get a set number of bytes from a file. For instance, you could use a for loop like the following code:
use std::fs::File;
use std::io::Read;
fn main() {
let mut file = File::open("dummy.txt").expect("No such file found: ");
let mut byte = [0u8];
for i in 0..10 {
file.read(&mut byte).unwrap();
println!("Byte {}: {:?}", i, byte);
}
}
The Rust for loop makes me think of a C and Python hybrid, with the use of in and braces, though it has its own style with the .., an ellipsis, almost. Unlike Python you don’t have to use a the function range, and unlike C you don’t have to declare a type, use a conditional statement, or increment. The variable i is an integer, and the 0..10 will start at 0 and go up to, but not including 10, just like any other language.
You can also use for-each loops when you want to pull out one element at a time, like below:
fn main() {
let b = [1, 2, 3, 4, 5];
for element in &b {
println!("{}", element);
}
}
Notice you have to use a reference in the loop. This has to do with Rust’s ownership rules, something I’ll take a closer look at in a later post, but for now try and remove the & to see how helpful the compiler can be in helping you fix your mistake. It will tell you to borrow the array, like we already did using an ampersand, or use iter. Using iter would look like this:
fn main() {
let b = [1, 2, 3, 4, 5];
for element in b.iter() {
println!("{}", element);
}
}
I actually prefer this syntax, as I feel it’s more explicit. iter, as you may have guessed, stands for iterator. In this case Rust helps us turn the list into an iterator we can pull numbers from.
Even though loops are helpful in this case, we will not be using them. So, while we could use a for loop or while loop to accomplish what we want, that would mean we don’t get to explore much of Rust, nor create our own iterators. But before we can do that we need to learn about structs.
Structs, like classes in an object-oriented language, are something that are everywhere in Rust. They allow us to group similar items together, and then implement functions we can use on those items. Technically speaking, they allow us to create data types. Before we jump into using structs with our project, let’s look at a toy example.
Below is a program which defines a struct for 2D points, or more specifically, defines the type Point2D:
struct Point2D {
x: f32,
y: f32
}
fn main() {
}
We give a variable name, colon, and then the type of that variable, which in our case is a 32-bit floating point. Each field in the struct is separated by a comma. Below is how we would use it.
struct Point2D {
x: f32,
y: f32,
}
fn main() {
let pt1 = Point2D {x: 3.0, y: 1.0};
println!("({}, {})", pt1.x, pt1.y);
}
As with any other variable we use let when creating a Point2D, and allow the compiler to determine the data type for us. We could, alternatively, provide the type as let pt1:Point2D if we wanted to. In this case the compiler can infer we’re creating an instance of Point2D, so there’s no need to be explicit.
Now it’s time to add some functionality to our struct by using Impl:
struct Point2D {
x: f32,
y: f32,
}
impl Point2D {
pub fn display(self) {
println!("({}, {})", self.x, self.y);
}
}
fn main() {
let pt1:Point2D = Point2D {x: 3.0, y: 1.0};
let pt2:Point2D = Point2D {x: 5.0, y: 8.0};
pt1.display();
pt2.display();
}
One thing which is different from most OOP languages is that the struct and implementation of the functions which operate on its fields are separate. In fact, you can copy the entire impl Point2D section, creating two Impl blocks, and change the function display to display_again and it will now have added two functions to the Point2D type. This means you can taken existing structs and use impl to add your own functionality to it.
In this case, the function display is defined as pub, standing for public, with the default being private. Private means that you’d only be able to call it from within another function associated with that struct.
You’ll also notice that it also takes a variable called self. If you’re not familiar with self, or this, or OOP in general, it’s a way for that function to reference the object which made the call. In our case, that’s pt1. When we call pt1.display() the function needs to know if it’s looking at pt1 or pt2. When we make that function call, the context (p1) is automatically sent along to the function for us.
Now that we have a passing understanding of structs, let’s learn about iterators. You may have heard about iterators, and you may have even used them before. For those of you who have no idea what an iterator is, and why it might be handy, please indulge me as I take a quick trip down memory lane.
When I was about 12 years old, my family moved across the US. My mom drove us over 1800 miles in a crappy old Suburban, and upon arriving at our destination in Colorado, we stayed on the outskirts of the city in a motel that had rows of connected faux log cabins and a football field’s worth of RV parking.
The best part about the place was the onsite convenience store had an attached pizza place with an arcade, and that arcade had the original Teenage Mutant Ninja Turtles. Given that there were four of us kids, it was a perfect distraction.
I’m not sure if our mom was so tired from the three day trip that it was easier to fork over quarters when we needed them, or if she just wanted us to have a good time, but once we started playing, every time one of us would run out of lives we’d run back to ask for another quarter, which she’d happily dole out.
She could have given us all the quarters at once, but instead she gave us only what we needed to continue, like an iterator. Iterators don’t waste time (cpu cycles) getting everything ready and handing it all over to you, they just give you the next thing you need, right when you need it.
In the toy example we’re building we’ll be getting the next number in a counter, but we could just as easily be getting the next Fibonacci value, next prime number, next record from a database or next byte from a file.
Before we can build the iterator we need a struct (type). Our type is going to hold the value for a counter (an integer (i32) in this case), and our iterator is going to help us do the counting.
struct Counter {
current: i32,
}
fn main() {
}
The next thing we’re going to do is implement a trait with our Counter type. Let’s see what that looks like first, and then we’ll discuss how it works. Note, the example below will not compile as is.
struct Counter {
current: i32,
}
impl Iterator for Counter {
}
fn main() {
}
The Iterator is a trait, and is something which already exists in Rust within the standard library. You can read about it here. What it does is layout the functions that we MUST implement, and those which are optional or already provided.
Rust traits are similar to abstract classes in C++ or interfaces in Java, if you’re familiar with either of those. If not, you can think of it as a way to enforce behavior, or consistency, so that when Rust programmers see that trait, they know what to expect from the type that implemented it.
In the documentation there’s a Required methods section, which tells us we must have the function next. It also tells us that Iterator has type Item. This Item is what type the next function will return. Take a look at the code below and see if you can figure out what’s going on before you read on.
struct Counter {
current: i32,
}
impl Iterator for Counter {
let Item = i32;
fn next(&mut self) -> Option<i32> {
self.current = self.current + 1;
Some(self.current)
}
}
fn main() {
}
There’s a lot of new stuff here, so let’s walk through it.
First, we have let Item = i32; which tells the Iterator that it’s going to be returning an integer value each time we call next.
Then we have the next function itself, which takes in a &mut self, which I hope makes sense, since self would be a Counter instance. Because we’re modifying the current value within the counter it has to be mutable.
After that we have Option, which is a type that returns either Some (some value) or None. It’s similar to Result, but for cases like an Iterator where getting no value None doesn’t produce a program-ending error. The following video does a good job of explaining Options:
In the same way we wrapped our Result return values in either Ok or Err, we wrap our return value here in Some. Since our counter will go on forever (or until it overflows the i32 type) we’ll just return Some(self.current) and ignore the possibility of None until later.
Wait, you might be asking, how are we returning anything without a return statement? In Rust, the last value in a function is returned if there is no semicolon. If this syntax bothers you, just add a return and Rust won’t complain.
The last thing we need to do is actually use the iterator by calling the next function. To do that we need a Counter instance which defines the current value we’ll be counting from. You can see in the code below that I’ve created one called my_counter and initialized the counter’s current value to 10.
struct Counter {
current: i32,
}
impl Iterator for Counter {
type Item = i32;
fn next(&mut self) -> Option<i32> {
self.current = self.current + 1;
Some(self.current)
}
}
fn main() {
let mut my_counter = Counter {
current: 10,
};
println!("{}", my_counter.next().unwrap());
println!("{}", my_counter.next().unwrap());
}
Since the next function returns an Option type, which wraps the return value in Some or returns None, we unwrap what we get back before we can print it. Every time we call next the current value is incremented.
To summarize, the instance of the Counter type (my_counter) holds onto the current value of the counter. When my_counter is created (instantiated), the next function is attached to it as part of the Iterator trait. We can then call that function to update the variable current.
You may be asking yourself at this point why we didn’t just do something like the following:
struct Counter {
current: i32,
}
impl Counter {
fn next(&mut self) -> i32 {
self.current = self.current + 1;
self.current
}
}
fn main() {
let mut my_counter = Counter {
current: 10,
};
println!("{}", my_counter.next());
}
This actually works just fine, and is a perfectly acceptable solution to our toy problem. It lacks a few things though:
With all that said, we’re now going to make use of our new found power of structs and Iterators to read a file one byte at a time.
In the last lesson, and at the start of this one, we gave the code for reading a single byte from a file:
use std::fs::File;
use std::io::Read;
fn main() {
let mut file = File::open("dummy.txt").expect("No such file found: ");
let mut byte = [0u8];
file.read(&mut byte).unwrap();
println!("{:?}", byte);
}
For the most part all we need to do is find a way to work this into the same format as the Counter iterator above. At this point, I recommend you go off and try and do it yourself before going on.
We’ll call our type ByteReader. It will contain both the current byte we’re looking at, as well as the file handle.
use std::fs::File;
use std::io::Read;
struct {
byte: [u8; 1],
file: File,
}
fn main() {
let mut file = File::open("dummy.txt").expect("No such file found: ");
let mut byte = [0u8];
file.read(&mut byte).unwrap();
println!("{:?}", byte);
}
Remember that we need to define our byte as being an array because when using the read function the type expected is an array. As we did previously, you could fill more than one byte, but [u8; 1] is creating space for only one unsigned 8-bit integer.
Now let’s add our Iterator, which will return a u8, as opposed to the i32 we used earlier, since we’re working with bytes. The next function requires that we return an Option type with a u8, so we’ll give it a temporary one as Some(0) in order to get it to compile.
Everything in main will stay the same for the time being, and though the program should compile, it will give you warnings that you haven’t used the variables you’ve defined.
use std::fs::File;
use std::io::Read;
struct ByteReader {
byte: [u8; 1],
file: File,
}
impl Iterator for ByteReader {
type Item = u8;
fn next(&mut self) -> Option<u8> {
Some(0)
}
}
fn main() {
let mut file = File::open("dummy.txt").expect("No such file found: ");
let mut byte = [0u8];
file.read(&mut byte).unwrap();
println!("{:?}", byte);
}
In order to use our Iterator we need to create an instance ByteReader, and that instance will contain a value for byte and a file handle to an opened file. That file handle will keep the file open as long as the iterator is running. We’ll go ahead and create an instance of ByteReader called my_byte_reader.
use std::fs::File;
use std::io::Read;
struct ByteReader {
byte: [u8; 1],
file: File,
}
impl Iterator for ByteReader {
type Item = u8;
fn next(&mut self) -> Option<u8> {
Some(0)
}
}
fn main() {
let mut my_byte_reader = ByteReader {
byte: [0; 1],
file: File::open("dummy.txt").expect("No such file found: ")
};
println!("{}", my_byte_reader.next().unwrap());
}
The default value for our byte will be zero, and it will be the only element in our one element array. We also open the file just as we did when we weren’t using an iterator. Finally, we call the function next and unwrap the Option type, just like we did with our Counter type from earlier.
There’s one more thing left to do, and that’s read from our file. If you remember, reading from a File type returns a Result, which is either an Ok or an Err. But if you look up at our function next, it returns an option. To make things work we’ll need to use match to take any Err and produce None and any Ok and product Some.
use std::fs::File;
use std::io::Read;
struct ByteReader {
byte: [u8; 1],
file: File,
}
impl Iterator for ByteReader {
type Item = u8;
fn next(&mut self) -> Option<u8> {
match self.file.read(&mut self.byte) {
Err(_) => None,
Ok(_) => Some(self.byte[0])
}
}
}
fn main() {
let mut my_byte_reader = ByteReader {
byte: [0; 1],
file: File::open("dummy.txt").expect("No such file found: ")
};
println!("{}", my_byte_reader.next().unwrap());
}
What is being returned from next is the value returned from match, hence the lack of a semicolon. This reading of the file is nearly the same as how we did it without the iterator, while here it has a self., because it’s the file handle attached to the my_byte_reader instance.
You’ll notice something different with this match statement. With both Err and Ok there are no variables in the parenthesis, just an underscore. The underscore is something we’ve put at the start of a variable that is unused (eg. _variable). Here, it just means ignore whatever value we have, since we’re not using it. It’s helpful to use this in situations where Rust expects a variable, and will give a warning if you don’t use it.
Since read puts the value in the variable byte, that’s all we need to return back to the user. We have to wrap it in Some, because the function next is returning an Option and that means it’s either Some or None.
And with that, you should have a working iterator that pulls bytes from a file.
In this post we looked at functions, loops, structs, iterators, and the match statement. With the exception of iterators, these are things you will see in almost every substantial Rust program.