This post is a short tutorial on Rust programming language through the lens of reading files, working with Result types and macros. It was originally written to help some of my former students learn the basics or reading files, while also getting better acquainted with Rust. The content is meant for high school students who want to branch out a bit into another language, but who also want to understand the how and why of all the different pieces.
In this post we’ll be reading out a single byte from a file. What you’ll see in this tutorial is by no means the best or only way of doing things.
Here we’ll get Rust install and create a project.
There are already a number of Rust tutorials that help users get their environment setup. The official one is fine, and you can find it here. Personally, I use Vscode as my editor, along with a few Rust plugins, but any editor will work. All you need is something to type in that saves to plaintext.
The first thing you’re going to do is create a new project using cargo. Cargo creates and runs projects, as well as installs dependencies. We won’t use any dependencies with this project, but we will use Cargo to create and run our project.
Open a terminal window (command line in Windows) and navigate to a folder where you want to create the project. Type the command below:
$ cargo new project_name
If nothing happened, or you got an error, please refer to the installation instructions for your OS to make sure you’ve installed everything correctly.
As long as everything goes smoothly, you’ll end up with a folder with the structure below, where - is a folder and * is a file. You can ignore the stuff in the .git folder and in .gitignore, unless you plan on putting your work on github or gitlab.
- project_name
- src
* main.rs
- .git
- [various git related folders and files]
* Cargo.toml
* .gitignore
The only things you really need to pay attention to are the main.rs and Cargo.toml files. An .rs file extension is a Rust source file, while the .toml file is for project configuration. If you open these two files you will see the following, minus the comments I’ve added at the top:
# [Cargo.toml]
[package]
name = "test1"
version = "0.1.0"
authors = ["kilroy <tsbach@gmail.com>"]
edition = "2018"
# See more keys and their definitions at
# https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
The Cargo.toml file is a configuration file, which if you’re used to Nodejs is similar to your package.json. The Cargo.toml provides information about the project in the package section, and allows you to add dependencies (meaning additional libraries) under dependencies. There is more you can do with it, but we’ll stick with the basics for now.
At this point the main.rs file should be your typical hello world program:
// [main.rs]
fn main() {
println!("Hello, world!");
}
In main.rs you will find a single function, denoted by the fn, called main. When the program runs, it prints out “Hello World!” to the screen. It does this using println!, which is actually a macro, and not a function. We’ll get to what this macro things is about shortly, but for now, run the program by doing the following from the root of project folder (same place the Cargo.toml file is located).
cargo run
This should compile and print out “Hello, world!”
I mentioned that println is a macro, but what is a macro? A macro is a way to write a single instruction which Rust will replace with a bunch of other code BEFORE it compiles. As an example, you could write a function called print100 that you call as print100!(“HI”). When you compile your program it expands the macro into 100 print statements (or a loop which does this) inside your code. Only after that does it compile.
Ok, you might be thinking, but why not just make it a function that loops 100 times? Well, in this case you could, but with the println! there’s a reason for doing it that way, and it has to do with Rust not allowing a function to have an arbitrary number of parameters. In languages like Python you can have as many parameters as you’d like in a print function.
print("Hi", "there")
print(1, 2, 3, 4, 5)
In order to get Rust to have this same functionality println was made into a macro which expands out into multiple print statement which will create an equivalent output. The way the println in Rust appears to work is like this:
println!("{} {}", "hi", 100);
We use the braces {} as placeholder for a variable, and then add a parameter for each pair. In the example above there are two pairs of braces, and thus two parameters. Before the program is compiled, this statement will be replaced with two statements which perform an equivalent print operation called [_print] which is actually using another function called print_to. All this is to keep the Rust compiler happy, while also keeping most of the ugly bits from you.
You can create your macros as well, but that’s not something we’ll be touching on here.
Let’s get on with the show and look at opening files. The first thing you want to do is create a dummy text file for us to play with, and put in the root folder of your project, like shown below:
- project_name
- src
* main.rs
- .git
- [various git related folders and files]
* Cargo.toml
* .gitignore
* dummy.txt
Add a few random lines of text inside or copy what I have below:
There once was a runner named Dwight
Who could speed even faster than light.
He set out one day
In a relative way
And returned on the previous night.
Now, let’s open the file by modifying our Hello, world! program as shown below:
use std::fs::File;
fn main() {
let file = File::open("dummy.txt");
}
The Rust compiler is really helpful and will give you a warning that you haven’t use the file variable and ask you to change the variable name to _file. If you do that, the warning disappears.
Hopefully this code is pretty straightforward. We start off by importing File, which is done with the use keyword. We get that from the standard library (std), of which fs is a module. Within that module are numerous structs, of which File is one. You can check out the docs here to read more about the File struct.
If you’re not familiar with structs, for now consider them a collection of related functions or variables used for a specific task. The more technical answer to that question would be they are a data type that are used to group related functions and variables. Structs are something you’ll see everywhere in Rust. If you don’t understand at the moment, don’t fear, just keep going and it’ll click eventually.
Before we move on and do stuff with the file, I want to show you something about the file handle (a handle is a variable connected to a file). Let’s print it out.
use std::fs::File;
fn main() {
let file = File::open("dummy.txt");
println!("{:?}", file);
}
I didn’t go into any detail about the braces {} earlier, but now that we’ve thrown in a colon and a question mark, I should mention there is a lot you can do with this, especially when it comes to formatting. We’ll actually use formatting another tutorial, which you may want to check out after this one. In this case the {:?} says we’re dealing with debug info. It’s used when you want to look at the contents of a struct instance, for example. You can learn more about these curly braces here.
When you run the code above you get the following, where the “…” is your path to the file:
Ok(File { fd: 3, path: ".../dummy.txt", read: true, write: false })
This your first look at Rust’s Result type. The Result type, in the case where the file can be opened, returns a File instance, but it’s wrapped on something called Ok. This Ok tells us that it successfully opened the file. But what happens if it can’t find the file? If you change your file name to something other than dummy.txt, you will get the following output:
|
|
When a function which returns a Result type fails, it returns an Err which is wrapped around a particular error message. To summarize, the Result type is used by the open function of the File struct. The Result will either be successful and produce Ok, or unsuccessful and produce Err.
When you have a Result type you’ll want to do something with based if it’s successful or an error. In our case, if there’s an error we’ll exit the program while giving an error message, or return the file handle if successful and assign it to the file variable. To do that we’ll use match, Ok(), Err, and the panic! macro.
use std::fs::File;
fn main() {
let file = match File::open("dummy.txt") {
Ok(f) => f,
Err(e) => panic!("No such file found: {}", e),
};
println!("{:?}", file);
}
While this is quite verbose, we’ll see a way to trim it down shortly. The point of showing you the verbose form is so you can see what’s going on under the hood.
In the code the match statement looks at what open returns and tries to match it to either Ok, containing something denoted by f, or Err, containing something denoted by e. These variables names f and e don’t matter, and you can change them to be whatever you want. They are just placeholders which hold either the value that File::open return, or an error. .
Now, to test the error handling, change the dummy.txt file name to something that is incorrect and run it to see what error we get.
thread 'main' panicked at 'No such file found: No such file or directory (os error 2)', src/main.rs:7:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
As you can see, your error message is there, followed by the colon and error e.
To summarize, one place we can use match is when a function is returning a Result type, or to put it simply, when we have a situation where something may or may not give an error. The result can be either Ok or an Err, and we can modify what is done upon encountering these, as we see fit. A common response for Ok is to return the value that the function returned, and for Err is to panic and end the program.
I mentioned I’d show you a simpler way to do things, but in truth there are a few ways you might do it, though each way is slightly different. Let’s first look at unwrap.
use std::fs::File;
fn main() {
let file = File::open("dummy.txt").unwrap();
println!("{:?}", file);
}
This does almost the same thing as the match expression from the last section, though it lacks our personalized error message if the file is not found. That can be done by using except, which works like the following:
use std::fs::File;
fn main() {
let file = File::open("dummy.txt").except("No such file found: ");
println!("{:?}", file);
}
This will produce the same output as the program using match, and is in my opinion easier to read.
We’ve opened the file, and we’ve see how to handle errors, so, finally, it’s time for the bytes. The code for reading in a single byte is:
use std::fs::File;
use std::io::Read;
fn main() {
let mut file = File::open("dummy.txt").expect("No such file found: ");
let mut byte = [0u8];
file.read(&mut byte).unwrap();
println!("{:?}", byte);
}
We’ll go through how each part of this works in the following sections.
There’s a lot to unpack in the above code, starting with the mut keyword, which stands for mutable. If you’re not familiar with mutability in programming, it’s the idea that once you define something you can change it. In Rust, variables are by default, immutable, meaning they can’t be changed. Try out the following program:
fn main() {
let a = 10;
a = 20;
}
You will get an error which says you can’t assign an immutable variable twice. Getting into why this is the case with Rust has to do with more advanced concepts, and is a bit beyond what we need to cover here. For now, keep in mind that all variables in Rust are by default immutable, and you need to use the keyword mut in order to allow them to become mutable.
But why does the File type need to be mutable now, when it wasn’t before? That has to do with read function. This isn’t something you need to know to use, but for those curious, functions in a struct are called in a way where instance_name.function(…) will take self as its first parameter. You can think of these in the same way as Python’s self or the this operator in other languages. To learn more, read this stackoverflow post.
Let’s now take a look at this line:
let mut byte = [0u8];
This creates an array that contains one unsigned byte-sized integer of value 0. The zero is referred to as a Rust literal, with the u8 suffix denoting an 8-bit numeric primitive type (an unsigned byte). A more simplified way of seeing this, is we have a value and we mash it together with the type so Rust can know how much memory to set aside for the array elements.
There are other primitive numeric types:
Length | Signed | Unsigned |
---|---|---|
8-bit | i8 | u8 |
16-bit | i16 | u16 |
32-bit | i32 | u32 |
64-bit | i64 | u64 |
128-bit | i128 | u128 |
arch | isize | usize |
The number at the end denotes the number of bits each one has. Since we’re dealing with bytes we’ve chosen u8, which is an unsigned 8-bit integer. Here are some examples of using literals with these types:
println!("{}", 10u8);
println!("{}", 33.4f64);
println!("{}", 100u32);
println!("{}", -6000i32);
I’d recommend trying them out for yourself.
You may be wondering why we need byte to be an array and can’t just use u8 by itself. This has to do with the read function expecting an array.
The last two lines we have to look at are below, but for the most part, there isn’t really anything new here aside from the ampersand (&).
file.read(&mut byte).unwrap();
println!("{:?}", byte);
The code above will print out the integer representation of the byte. If you’re coming from a language like C or C++, then you’ll be used to references. All they are is an alias for a variable. Take, for example, the following code which will produce 10 twice.
fn main() {
let a = 10;
let b = &a;
println!("{}, {}", a, b);
}
In this case b is a reference to a, which means that they’re both looking at the same value in memory. This is something which involves ownership and borrowing in Rust, something which is out of the scope of this tutorial. At a later date I’ll include a tutorial which covers these concepts. For now, just realize that you need this so that the variable byte can be assigned the value of the next byte that is being read from the file. Essentially the read function will get the reference to the variable you made and then assign that variable inside the function.
Lastly, the read function will either have success or it will fail in reading the next byte, and thus we need to unwrap that Result type. When printed out it should produce a positive integer less than 256.
The final program is as follows:
use std::fs::File;
use std::io::Read;
fn main() {
let mut file = File::open("dummy.txt").expect("No such file found: ");
let mut byte = [0u8];
file.read(&mut byte).unwrap();
println!("{:?}", byte);
}
In this we use the File and Read structs to open a file, unwrap the Result type, and then read in a single unsigned byte as the first and only element of an array. We then unwrap the result from the read function and print out the value of byte.