I Like Notebooks – Learn spaCy

[Jeremy Howard]

Introduction to Jupyter Notebooks

Hi there, I’m Jeremy Howard from Fast.ai and I’d like to tell you a bit about why I like Jupyter notebooks and help you maybe find some new ways that might help you really like them as well.

Why Jupyter Notebooks are often criticized

I feel like this is kind of a dangerous thing to say, I like notebooks, because every time I do to a serious software engineer type, they tell me all the reasons that I should not like Jupyter notebooks, and they kind of act like I must just be ignorant and don’t understand the better ways to code. But actually, I’ve built a lot of good stuff in Jupyter notebooks and I’ve been coding for, gosh, about 30 years over that, pretty much every day.

Jeremy’s experience with Jupyter Notebooks

I’ve used a lot of different IDEs, a lot of different editors, and Jupyter notebooks seriously makes me at least twice as productive and I have a lot more fun. I’ve built a number of popular software libraries like these ones in Jupyter notebooks, in particular Fast.ai, which is perhaps the most popular PyTorch deep learning API, other than PyTorch itself I guess, and is very widely used at many companies, many researchers, many universities, and so forth.

[00:01:16]

Fast.ai library and its documentation

One of the cool things you’ll see in the Fast.ai library is that actually the documentation which you see here, it’s got all these examples scattered throughout it, and nice things like links to source code and links to papers and links to other parts of documentation, and actually you can click on any part of that documentation at the top and the opening colab button, and if you do that, then suddenly you’ll see that entire documentation appear as an interactive experimental playground you can play with yourself, because you see all the documentation is written in Jupyter notebooks, and actually not just the documentation but all of the code itself for the library and all of the tests, and they’re actually all in

[00:02:02]

the same notebook, so if you start looking at one piece you can see everything. You can see, as you see here, the implementation of this combined cosine scheduler, you can see the examples, you can see the tests, you can see the documentation, and you can start playing with it straight away to experiment with some different values and see how it works, look at the inputs and outputs and so forth.

NbDev for notebook development

I think that’s really cool. The way that I do this is by using something called NbDev for notebook development. NbDev is a really amazing project which I’m going to tell you a lot about at the end of this talk, the second half of this talk, but basically what NbDev does is it lets you create Python modules directly from your notebooks, you can export changes from your editor back to your notebook if you want to change things in an editor or IDE directly, it automatically creates searchable documentation, it automatically creates PyPy and Conda installers, it will run your tests in parallel when the tests are in notebooks, it will handle continuous

[00:03:04]

integration, it will handle version control stuff, and so forth.

Examples of code written in notebooks

It’s really, really nice. I write all kinds of stuff in notebooks, and here’s an example of a little server I made, and so I made this little server and it’s a GitHub or a Git webhook server, and the nice thing is that I haven’t really done much stuff directly using Python’s built-in HTTP handler classes before, so I started experimenting with them, and I did so in a notebook, and as I experimented I took down notes for myself and I started creating examples and little tests, and this now becomes part of the documentation and the source code and the tests of the library I ended up building, which is called FastWebHook. So you can see that you can write any kind of code in notebooks and you can end up exporting it into a real library and now anybody can download FastWebHook and then

[00:04:07]

they can see not only the final result but the process I took to get there and understand my thinking, understand the APIs I’m using, understanding the parts of the Python standard library I’m using, because it’s all documented in this process. So a lot of other people are now using nbdev, and one of the best comments I’ve seen is from Hemal Hussain from GitHub, who said tests, docs, and code are part of the same context and are co-located.

Hemal Hussain’s comment on NbDev

So this is what happens when you write with nbdev. He says there is no other programming environment that exists like this that I’m aware of. You can even make notes to yourself about why something works the way it does, very easily, while you’re writing the code, and it isn’t an afterthought. This is fundamentally why I have a problem working in anything besides nbdev, because not only does it make the code more approachable to others, but forcing you to write docs actually forces you to think about the code more.

[00:05:06]

Refactoring code for simplicity

In my personal projects that use nbdev, I often refactor my code to be simpler and better after forcing myself to explain it, and I have the exact same experience. It really makes a difference to my workflow, and a lot of this is really thanks to the underlying Jupyter notebook system, which nbdev sits on top of.

Deep Learning for Coders book

Sylvain Gutter and I, Sylvain is my co-author on Fast.ai, he’s also my co-author on the Deep Learning for Coders book, which has been incredibly popular, including some big names you probably know about who really like it. This whole book was written entirely in Jupyter notebooks, and then we exported it directly with a single little script we wrote into AsciiDoc, sent it off to O’Reilly, and they published it into this beautiful book. And a lot of people have commented on how nice this book looks, how good it feels, it’s got color, and nice little icons, and all the stuff you’d expect, a really nice index, and so forth.

[00:06:08]

So we’ve created a book that we’re really proud of, and a lot of people really like.

FastDoc for converting notebooks to books

And if you were to write a book yourself as well, you can. You can pip or conda install FastDoc, which Sylvain and I have made available. This is the exact same thing that we use to make our book, and you can run a single command, FastDoc convert all, and it will convert all of your notebooks into a publication quality book, or at least the AsciiDoc source for it, which you can then send to a publisher. All you have to is write the book.

Fast.ai course taught in Jupyter Notebooks

Here’s another example of something that we created with Jupyter Notebooks, which is a very popular course, and of course people really, really like. Nearly everybody seems to like this course, which we’re so proud of, because we spent a lot of time trying to get it right. And the whole thing is actually, or nearly the whole thing, is actually taught in Jupyter Notebooks.

[00:07:03]

And the students then take these notebooks, and what we do is we clear out all of the pros and all of the outputs, and we ask the students to try to go through the notebook and figure out what’s going to happen next, and why are we doing this. It’s a great way to kind of force people to think about, like, oh, did I really understand this? Do I really know what’s going to happen? And then they can run it and check, and if the answer is different to what they expected, then they can experiment. It’s a really terrific way to learn, and pretty much all of our students have said that once they get into it, they really adore it, they really find it terrific.

Literate programming and exploratory programming

Overall, the key thing, I guess, that I like about Jupyter Notebooks is that they support literate programming. Literate programming is something that I have been fascinated in ever since I read about it in the early 90s, developed by Donald Knuth, the famous computer scientist, who describes it as a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably, certainly, I find more fun to write.

[00:08:12]

The main idea is to treat a program as a piece of literature addressed to human beings rather than to a computer, and this is certainly the way that we now write code with notebooks and NBDev, and I think we actually go beyond literate programming to what I call exploratory programming, where we’re using our notebook as a kind of a journal, like a scientist’s journal, and then when we’re done, we’ll go back and we’ll try to clean it up as best as we can, and we’ll make that part of what we publish. So, for example, here’s the

NBDev source code and its evolution

actual source code from NBDev itself, right, and at the very start, Sylvain and I didn’t know much about, like, what is the Jupyter notebook really behind the scenes, and as we started exploring and realizing it’s just JSON and printing it out, that became part of our documentation and source

[00:09:03]

code, and you can see we start to create and export functions as we go along, and that becomes part of the library. So then when somebody else comes along and wants to contribute to NBDev or to any project written with NBDev, they don’t really need a huge amount of hand-holding and helping them get involved, because they can see not just the final result, but the process to get there and the thinking and the choices that were made along the way, because they’re all part of the notebooks, and they can even see the tests and see how the tests are related, and it’s all there in one place.

Live coding environment

So, overall, notebooks give us a live coding environment. It’s live in the sense that you’re working directly programming against those live objects. The actual system that you’re building them in has the state, has the actual current state of all of the variables and all the systems in memory, and you can directly interact with it.

[00:10:07]

This idea goes back a really long way.

Smalltalk as a live coding environment

One of the most famous examples of a live coding environment is Smalltalk, this one here from Smalltalk 80, and as you can see here, as the code is running, things are actually happening on the screen, and anybody involved in Smalltalk in those early days will tell you that this was a critical part of why this was such a productive system and why it was such a loved system, and a lot of people say there’s never really been anything like Smalltalk again. We’re kind of almost rediscovering things from decades ago. There are other interesting examples of live coding.

Sam Aaron’s live coding performance

Here’s a great one from somebody called Sam Aaron, who actually does live coding as performance. Here is him writing music with code in real time.

[00:11:12]

So I think that’s pretty cool. Here’s something which is even cooler.

Brett Victor’s live coding environment

This is Brett Victor, a brilliant designer and a brilliant thinker, showing a real live coding environment he created which allows him to create games in a whole new way. I mean, not just games, he could use this for so many things, but here’s an amazing example using it to build a computer game.

[(Unidentified)]

So bounce off my turtle, pause the game, and now hit this button here which shows my guy’s trail. So now I can see where he’s been, and when I rewind, this trail in front of him is where he’s going to be. This is his future, and when I change the code, I change his future.

[00:12:01]

So I can find exactly the value I need, so when I hit play, he slips right in there.

[Jeremy Howard]

Aspiring to create engaging live coding experiences

So I’ve got to say, I’ve never managed to build code in a way where the people watching it went whoa and then started clapping. It’s certainly something to aspire to, and you can see how much people really love this idea of actually interacting with a live coding environment. Brett Victor’s been very inspirational.

Chris Latner’s Playground system

One of the things he inspired was Chris Latner. He’s the guy who created LLVM, he created Swift, and he built the amazing Playground system, which as you can see here, as the code is running, you can actually see what it’s doing, and you can even plot it and so forth. Another great example of a popular and important and powerful live coding system. So I was so proud when actually Chris himself said he thinks that NBDev is a huge step forward for programming environments, and so for that to come through Chris was a huge validation for us that somebody we really admire thinks that we’re absolutely going in the right direction.

[00:13:14]

Joel Groves’ coding approach

Most people, however, are not using this kind of live coding environment, despite the decades of work that’s kind of gone into these kinds of systems and the productivity that we’ve found comes from it. Here’s how a lot of people code, and I’m going to give an example here, you’ll see why in a moment, of a very successful coder named Joel Groves.

Joel’s coding style and its limitations

This is Joel here, and he’s good enough to actually do coding which he puts out on the internet for people to watch, and I watched it to see exactly what this looks like, and what he does, like a lot of people do, is he has what’s called a line-oriented REPL down here. This is something where you can type in a line and it returns a line, and then the rest of it is a kind of a standard editor IDE. This is VS Code, which is one of the best, or maybe the best, editor around.

[00:14:03]

So what happens as he codes? You can see here that he has to kind of go back up to find something he’s done before, and it’s the wrong thing, and then he has to edit it, and then he’s got an assertion error, and now he has to go somewhere else, and then comes down here again. Now he’s getting this kind of weird situation of some state that’s come from the code, and some state that’s come from the things he’s typing, and now he’s going back up here and trying to edit this, and now bringing it back down here again, and he’s still getting the error. You know, as I watch this, I find this painful. You know, I don’t want to write code like this. I kind of feel like this picture is Joel saying, ah, this is too much. But I feel like a lot of real programmers tell me, you know, this is how you should code, and it kind of feels like they’re saying, hey, you know, we should go back and use line-oriented REPLs for everything, like editing. We used to edit with ed, the Unix editor, which was a line-oriented editor.

[00:15:03]

And as you can see, the basic approach is exactly the same as what Joel was using for manipulating Python. Now these line-oriented REPLs, you know, it’s not a great way to edit text. Very few of us use ed nowadays, and I would argue that it’s not a good way to work with any kind of code objects. It is linear. There’s a reason that we have line-oriented REPLs, and that’s because we’re used to code like this.

[(Unidentified)]

APL as a powerful programming language

If we enter maximum slash y, we get the maximum element in the vector y.

[Jeremy Howard]

So you can see here, he is typing one line at a time, and it’s printing one line at a time. By the way, this is APL, which is decades ahead of its time, and still one of the best brilliant programming languages in the world. But I would argue that we should be moving beyond the type a line and have a line printed approach that was developed for this kind of coding.

[00:16:09]

Dead coding environments and their limitations

So these kind of editing environments, like VS Code, VS Code is a brilliant piece of software, but I refer to it as a dead coding environment, because you’re not interacting with live code, and that leads to errors. You get this kind of gap between the system you’re working on and the final result you’re creating. So Joel actually wrote a fantastic book, which, despite being fantastic, it has some errors in it.

Errors in Joel’s book

And the kind of errors are very interesting. This is from his errata page. They’re errors that say the code, you can’t actually run it, right? So this line of code doesn’t work, and this line of code doesn’t work. One of the really interesting ones was not only this line of code doesn’t work, but hey, you’ve got a code repo where it does work.

[00:17:02]

And so there’s this kind of like gap between the actual code you’re doing and the book that you’re writing, and then they become out of sync, and your readers end up confused because the code doesn’t run. All the code in our book runs, not because we’re particularly brilliant, but just because we ran it all in a notebook. And so all the outputs you see are the actual outputs that came out of the notebook. Now of course, one of the libraries might change, or there might be a breaking change to Python or something. There could be something which could cause it at some point in the future to break, but at the point that we wrote it, and as far as I know right now still, the code is correct and it works because, as I said, it’s the code that we actually ran. There was no, it’s not a dead coding environment. It was a live coding environment we used to create the book, and the book directly comes from and is that code.

Joel’s presentation “I Don’t Like Notebooks”

So why am I talking about Joel’s book and Joel’s coding approach? That’s because actually he, a couple of years ago, did a brilliant presentation called I Don’t Like Notebooks, and in this presentation he explained why he thought we shouldn’t be using notebooks, and actually notebooks are not the right approach to building effective software or doing effective teaching.

[00:18:19]

And the reason I feel like I need to talk about it today is because he is such a brilliant communicator and such a funny guy that this presentation has become incredibly influential, and pretty much any time I say I like notebooks, somebody will say that’s not a good idea.

Addressing Joel’s criticisms

Haven’t you seen that presentation where that guy explained why they’re terrible? So I really feel like in order to tell you why I like notebooks, I also have to tell you why Joel is wrong, which he is. I really feel like he’s wrong. I’ve got a lot of good stuff in notebooks, and as you’ll see, I think the points he made are based on misunderstandings, or at least are sometimes now out of date.

[00:19:05]

Because his slides are brilliant, I’m going to use a lot of them, and also so you can see exactly what I’m responding to. Whenever I use his slide, I’m going to show this little icon in the bottom right hand corner.

Joel’s presentation slides

You’ll see it. The next 12 slides are actually all from his presentation. I haven’t edited them, because I want to make sure you see exactly what he showed.

Joel’s self-deprecating statement

And one of the things he did say in his presentation is, I am not a notebook expert, which is great. It’s nice to be self-deprecating and to kind of have that caveat, but he still expressed very strong opinions, and people still, as I said, really think he must be right. They tell me that I am making a mistake to think that I like notebooks. So I was actually worried when he first told me that he’s planning to write the talk that he did, because I know he’s a brilliant communicator, and I know he’s really funny, and I thought, uh-oh, a lot of people are going to listen to this and say, oh, I guess we shouldn’t use notebooks, because Joel has made a compelling case that we shouldn’t.

[00:20:12]

Joel’s reference to Jeremy’s opinion

And this slide is actually from his presentation. He actually said in his presentation, hey, look at what Jeremy said. I guess he thought it was kind of funny that I told him, don’t write this presentation, and he wrote the presentation. And so now I feel like I have to come back and say, like, okay, let’s let’s set the record straight here. So here’s what he said.

Joel’s criticisms of Jupyter Notebooks

He said he had a lot of strong opinions. I don’t agree with any of them, but here they are. He said, notebooks discourage good habits. He said, notebooks encourage bad habits. He said, notebooks encourage bad processes. He said that notebooks hinder repro, oh, hinder reproducible and extensible science. Uh, he said that notebooks are a recipe for poorly factored code.

[00:21:05]

Uh, he said that notebooks make, make it easy to teach poorly. I don’t think it’s a notebook’s fault that that guy’s getting hit over the head. I don’t do that when I teach with notebooks, by the way. He said, notebooks make it hard for me to teach well. So, um, he didn’t just state these, he gave reasons.

Key reasons for Joel’s criticisms

And, uh, here are some of the key reasons, or I think the key reasons that he expressed. Um, the first one he expressed, uh, was that notebooks have tons and tons of hidden state that’s easy to screw up and difficult to reason about.

[(Unidentified)]

Hidden state in notebooks

Um, which is strange.

[Jeremy Howard]

I, I don’t find this myself. Um, and, uh, he made the point that, uh, notebooks, or he says are dangerous.

Running cells in order

I don’t know if I agree they are dangerous, but he thinks notebooks are dangerous unless you run each cell exactly once in order. It’s like, oh my goodness, how am I going to do that? Uh, wait, Jupiter has a single button you can press to do that.

[00:22:02]

Uh, it’s actually not that hard.

Restart and run all functionality

If you really think it’s so important to run each cell in order, you have a way to do so. Um, personally, uh, I think it’s actually really, really important to have this ability to go back and fiddle with things, to change things, to see what happens. I like having the ability to go back and run in order, but I also like having the ability to actually, as we discussed, manipulate the live coding environment in real time to experiment and to say, what if. That’s a critical part of this. Um, but you do need a way to ensure that in the end the whole thing works. Uh, and not only does, um, Jupiter have a couple of ways to do that.

Restart and clear output functionality

There’s restart and clear output. Uh, sorry, I shouldn’t do restart and clear all output. Should be restart and run all. I made a mistake there. And in cell, uh, there’s also a few options such as, um, uh, run cells to here or run all cells. Um, and, uh, nbdev actually has something which runs all of your notebooks, all of your cells in order for you for a whole directory.

[00:23:04]

NbDev’s ability to run all cells in order

Uh, that’s the main thing I use. Um, another concern he stated was that, um, that you can’t copy and paste code and outputs from a notebook into Slack, or he also gave the example of that pull requests and issues in, um, GitHub.

Copying and pasting code and outputs

Now, this is an example of trying to do things the same way you’ve always done them without thinking about what’s the actual problem you’re trying to solve. Now, the actual problem you try to solve is to say, here’s what I’m trying to do. Please explain why this doesn’t work. Or here, have a look at this example I’m showing or whatever.

Sharing notebooks for bug reports

And here’s how it actually looks. It’s actually way better than cutting and pasting into Slack. When we get a pull request or an issue, there’s a bug report, Colab notebook reproducing the behavior. Now I click on that and I get a whole notebook fully self-contained where I’m not just seeing this person’s claim, oh, I typed this and this happened, but I can actually try it.

[00:24:07]

And that means I can then actually try to fix it right there and then.

Fast.ai documentation and courses as notebooks

And this is particularly helpful because all of the fast AI documentation, all of the fast AI book, and all of the fast AI courses are also available as notebooks. So people can use that as a starting point. And I could, or I can say like, oh, did you try the code in the book? If you have a non-working example, could you modify the book, notebook to show us how yours doesn’t work and so forth. So rather than saying, how do I copy and paste into Slack or GitHub, the question should be, how do I understand the problem that a user is having or understand the idea that a user is telling me about? And the answer to me is by providing an actual live coding environment I can see that. And it’s so easy to do with Jupyter.

[00:25:04]

ReviewNB for reviewing pull requests

Something else I really like about Jupyter is you can use something like what I really enjoy at ReviewNB to look at pull requests. And pull requests don’t just show me the code that’s changed, which is fine. They do. It’s very nice. But they also show me the outputs that have changed and the documentation is right next to it. So here’s somebody changed a test, right? And rather than thinking, oh, I wonder if those scales are any good and then having to go back and load in their PR and run it and then have a second version of the code and run that and compare the two, in ReviewNB I can see them right next to each other. And I can say, oh yeah, this actually does look like a more clear example to me. And I can see the documentation is right next to it and I can see exactly what’s going on.

Sharing notebooks using GISTIT

There’s lots of ways of sharing notebooks. Another is to press this button. This is the GISTIT button. Here’s a notebook that I created and you can copy and paste images directly into a notebook.

[00:26:03]

So here’s one I just copied and pasted in. And if I click that button, then it automatically gives me a shareable GIST URL. So I can paste that into Slack. That’s at least as easy as copying and pasting from IPython. And of course, I get the benefit that I’m copying and pasting not just text, but pictures.

Working with non-textual data

And, you know, a lot of us are working with things other than just text nowadays. We want to be able to show plots, you know, histograms of things in analytics. We want to be able to show pictures. We’re going to be able to show videos. We’re not just working with text all the time. And so with something like this, you can really show a much more complete example a lot of the time. It’s really nice and easy to do. Another concern, as you can see, we’ve still got our little pictures down here. This is still Joel’s code, sorry, Joel’s slides.

Reproducibility of notebooks

Another concern he had was that he thinks that notebooks are harder to reproduce.

[00:27:06]

And this one, he didn’t really explain why he thought that way. And I don’t fully understand the thought process here. All of the same ways that you can use for dependencies in regular Python libraries, like requirements.txt, or environment.yaml, or whatever, or setup.py, you can use exactly the same thing for notebooks. But in practice, though, you know, notebooks I really love because when you provide a notebook, you can just provide a cell at the top, which creates the environment you need.

Creating reproducible environments

So, for example, you can open any chapter of the Fast.ai practical deep learning for coders book directly on Colab by clicking on a link without any installation. And the first line, or the first cell, installs everything you need. And away you go. So really, to me, I feel like notebooks make it much easier to ensure that you have something that’s reproducible.

[00:28:06]

Seeing the programmer’s steps

And you can also see what the programmer did step by step to really make sure that what you’re seeing is what they were seeing. Look, you can certainly make bad notebooks. You can certainly provide bad reproducibility environments.

Creating bad notebooks

But I don’t think it’s anything to do with notebooks themselves. You know, it’s, to me, this is an environment that actually makes that easier to do well.

Good software engineering in notebooks

So, the other thing that Joel talked about quite a lot was this idea of good software engineering. And he made some pretty bold claims that good software engineering can’t be done or is extremely hard to do in notebooks. And he used these characters quite a lot, these smurfs. And basically, you know, he’s saying, like, you should all follow the rules of good software engineering.

[00:29:01]

Following software engineering rules

But, you know, it’s kind of like this idea that you should copy and paste code and outputs into Slack. You know, that’s how people might have done things before. But, you know, maybe the rules of software engineering in a dead coding environment or in a line-oriented REPL or whatever are not the same, particularly, you know, compared to a dynamic language in a live coding environment.

Domain expertise and environments

And also the rules for a data scientist who’s doing research and their focus is on speed of iteration and on rapidly eyeballing visualizations to see whether their, say, their microscopy images are actually getting easier to see or harder to see, to take an example of a project I’ve been involved in a lot recently. These are kind of going to be different to the rules, the so-called rules, of somebody who’s creating a CRUD app or a e-commerce app to send a payment to a Stripe API.

[00:30:08]

So I think we’ve got to be careful about the idea of rules and think about domains and domain expertise and environments.

Modularity in notebooks

So here’s another slide from Joel, and his concern was that notebooks are not good for modularity. And he’s giving example here of some of his code, which he’s saying is very nicely modular. I mean, sure, but why can’t you do the exact same thing in notebooks?

Fast.ai’s modularity

And in fact, Fast.ai, the library I told you about, that we wrote entirely in notebooks, actually the modularity is so good that we have a peer-reviewed paper about the approach to modularity that we took and about how the kind of decoupled API that we created. I’m sure it’s not perfect, but a lot of people have used it and have liked it and people are studying it as an example of modularity.

[00:31:05]

It’s definitely not the case that notebooks somehow make it impossible or even difficult to create modular code.

Testability in notebooks

I’d say the same thing about testability. I don’t know if this is from Joel’s tests. I guess it probably is. Again, this is one of his slides. He’s showing here examples of tests. Tests are great.

Tests in dead coding environments

But in this kind of approach to this kind of regular approach to coding in these dead coding environments, the tests live separately to the code that’s being tested. And it’s very easy for somebody to look at the code and not even notice the tests, or they’ll have to kind of flick back and forward between the two. And they don’t really, it’s not easy to connect which test is really working on which part of the code.

Tests in notebooks

But where else in notebooks, and also with nbdev in particular, the tests live right next to the thing they’re testing.

[00:32:04]

And they’ll include pros explaining what it is they’re testing. So here we’ve created a thing called unbuffered server. I think it was in the cell above the one I took a screenshot of here. And so here I’ve created a test handler to test it. It sends a response and writes okay. And here’s something that checks whether, you know, that starts a server, and then checks whether it actually receives that okay or not. So it’s really nice having the tests in the notebook.

Running tests across notebooks

And then nbdev provides a way to run all the tests across lots of notebooks and report on the overall result. And that can be run in continuous integration. And nbdev gives you that actually out of the box. If you use the nbdev template, you get this kind of continuous integration testing for free. You don’t have anything to do. It just works, which I think is super cool.

[00:33:00]

Learning in notebooks

Another of Joel’s concerns from his slides is that notebooks somehow encourage a less sophisticated approach to learning.

Shift-enter execution

So you hit shift enter to execute a cell and go to the next one. Maybe people just do that without thinking. I mean, it’s possible people could could do that. I would say even that is better than people just reading the text and having nothing to do. But as I described, actually what we do is we have a little script that just removes all of the pros except for headings and all of the outputs.

Removing outputs for learning

And then we give this to the student and then they can run through each one. And before they run it, we say have a think about what this is going to output and think about why and think about why we’re doing it. And then if you guessed wrong, you know, or figured wrong, you can actually experiment because you’re in a live coding environment here.

Experimenting in a live coding environment

So you can actually see well where did I go wrong and what actually happens.

[00:34:02]

So I actually think this is a great way of learning. And a lot of our students have told us they think it’s a great way of learning. I don’t think I’ve ever heard anybody say that this ability to work interactively in this environment is decreasing their ability to learn.

Jupyter vs. text editors

So another thing that Joel said and gave a few examples is that notebooks are way less helpful than my text editor, which in his case is Visora’s VS Code. And he said some things are easier demonstrated. I’m going to show the opposite of his demonstrations, which is actually that Jupyter is more correct and more helpful than his IDE.

IDE completion in Jupyter

So here’s an example. Let’s get a URL, contents of a URL, and if it returns something valid, it’s like something truthy, then we’ll return a, otherwise we’ll return one. So this is obviously going to return something truthy, so this should be a string, and as you can see it’s giving me IDE completion for a string.

[00:35:07]

IDE completion in VS Code

VS Code, same code, gisb, completion for a number, not for a string. I’ve got bit length, case fold, conjugate. Okay, so VS Code doesn’t know, because VS Code, it’s doing the best it can, and it’s kind of pretty brilliant at, you know, given that limitation, but it doesn’t know.

Jupyter’s dynamic introspection

Jupyter knows because you ran the code, so it actually knows what you’re working with, and it can actually, because Python’s a dynamic language, it supports this kind of dynamic introspection of what is actually inside B, and what can B do. And so that’s what Jupyter can use. So Jupyter is just really, really, really helpful, because it can be really helpful.

VS Code’s limitations

VS Code does the best it can, but it can never be totally correct. It would literally be impossible without it actually trying to match the same stateful approach as Jupyter, because Python is dynamic, because it’s not fully typed.

[00:36:05]

And even if you do use types for something like B above, you would have to use a union type, you still wouldn’t actually know what the type is.

Joel’s suggestions for winning him over

So then Joel said, okay, here’s what you could do to win me over, and convert me to a notebook user. He said, give me IDE style autocomplete.

IDE style autocomplete in Jupyter

Well, as we discussed, IDE style autocomplete is not the be-all and end-all. It’s actually not fully correct. Having said that, Jupyter also provides IDE style autocomplete. If you give it types, then it will figure out what you mean, and if you give it functions like open that return a file, again it will figure out what you need. So we have IDE style autocomplete.

Real-time type checking and linting

He said, give me real-time type checking and linting. Okay, here is part of Fastcore library. As you can see, it’s like a dozen lines of code, and it actually gives you real-time, actually correct type checking.

[00:37:07]

Fastcore library’s type checking

So here you can see I’m calling foo, which is taking an int and a string, and if you pass it into an int, it’s checking, oh, it does in fact fail. And again, it can do this correctly only because it’s in Jupyter, only because it’s actually running the code.

MyPy’s limitations

The approaches that most people are taking to this kind of type checking is MyPy, and MyPy is not about 12 lines of code. MyPy is about 100,000 lines of code, and it’s complex code involving multiple different languages, and it’s never going to be correct. It can’t be fully correct, because it’s impossible for it to know exactly the types of all of your pieces of data, because it’s not actually running the code.

[00:38:01]

Python’s dynamic nature

And Python is dynamic. With Python, the only way to know what something actually contains is to run the code. Also, MyPy means you have to tell Python what every type is.

Manual typing in MyPy

And honestly, every other language is moving towards auto-detection of types, of figuring out what types are automatically, particularly early movers like F-sharp, but nowadays even stuff like Java, C-sharp, C++, you can have an auto type, and it fixes it out for you. Python’s kind of moving in the opposite direction, and if you want to go the MyPy static analysis IDE approach, you’re going to have to spend a lot of time doing manual typing.

Dependency management in notebooks

Another thing Joel said he wanted to see, to win him over, is a better story around dependency management.

NBDev’s dependency management

Sure, why not? As I said, notebooks can already support all the same approaches that normal Python projects can handle.

[00:39:05]

NBDev makes it even easier. You can just add a line to your settings.ne with a list of requirements. If there’s some special one for pip and conda, you can add those. Special ones for development time only, you can add those, and away you go. That will automatically make all of those things be installed for you when you run the notebooks. So we certainly have that. He also said he’s looking for first-class what is going on there?

Refactoring code out of notebooks

First-class support for refactoring code out of notebooks into modules. And I agree, this is absolutely critical, and this is really the key number one first thing that NBDev does.

NBDev’s ability to create Python projects

You start with some code like this, and again this is some source code of NBDev. NBDev, of course, is written in NBDev. It’s a notebook, and then it automatically creates an actual Python project.

[00:40:02]

So those all exist. Joel did not expect that to happen. He said the reality is you’re not going to provide me with all these things, and I’m not going to switch to notebooks. So so be it.

Convincing Joel to use notebooks

So hopefully I’ve convinced you that there’s no reason for you not to like notebooks, and that it’s not the case that real software developers have to use other tools. But actually notebooks really can be really great.

Focusing on NBDev

Let me explain more about how and why this happens, and to do that I’m going to focus on, in particular, NBDev. And I’ve already mentioned the basic things that NBDev does for you. Let’s look more at how that works and exactly what you need to do.

Export comments in NBDev

So here is an example of code in a notebook, and you can see here that I’ve got an export comment.

[00:41:00]

So NBDev uses a small number, like two or three different special comments, to tell it what to do. And this export says make this part of my Python project. This doesn’t have an export, so it’s not part of the Python project.

Splitting a class into separate cells

Now one of the things I like to do, this is another thing that Joel talked about as being a problem for him with notebooks. He said it’s hard to do, is splitting a class into separate cells. And actually with the fast.ai libraries using NBDev and fast.for, it’s not at all difficult to do. Here’s a class, and I’ve just got the init in it here, and I can create it, right?

Patch decorator for adding methods

And then later on I just use this patch decorator to add this method to this class. And so this is actually going to impact the documentation as well. The documentation of process comment will end up down here, and the documentation of class init, notebook processor init, is going to end up up here. And so it really helps the code reader understand things step by step. Each one has tests and examples, kind of as it happens.

[00:42:04]

Reading documentation step by step

And as you read through the documentation, you can see each piece one at a time. This is a really nice to me way to build up more complex classes.

Settings.ini file

All of the pieces of NBDev all get built out of a single simple little settings file, settings.ini. And it’s really nice because you can provide all of the information just in one place.

Centralized configuration

So rather than having a version number over here in init.py and over there in setup.py and over here in your documentation, you have it once and it’s used everywhere. Ditto for your description, ditto for your source of your documentation, and ditto for your git repo information. It’s just there in one place and then everything will use that for you.

Synchronization between notebooks and editors

You don’t have to put it in multiple places and think about how to maintain it and synchronize it. Talking of synchronization, not only can you start with a notebook and turn it into this code, which you can then open in, in this case I’m opening in Vim, or you can open it in VS Code or you can edit it in your editor, like Vim or VS Code, and it will sync it in the opposite direction too and update your notebook.

[00:43:13]

Editing in editors and synchronizing back

And so some things are easier to do in editors, like particularly kind of, you know, search and replace across multiple files and stuff like that. Or if it’s an unfamiliar code base, it’s nice to be able to use the tags to kind of jump across between files. You can edit as you go and then synchronize back to the notebook. So then, how does the synchronization work?

Notebook to script functionality

Well, there’s two ways you can do it. You can either put this as the last cell in each notebook, notebook to script, and that will take the notebook you’re working in and all the other notebooks and convert them into modules, or at the command line you can run nbdev building.

Nbdev build command

And so I, I have a bit of, I have this in every notebook that I use because it’s kind of nice to stay in the notebook environment.

[00:44:03]

Staying in the notebook environment

This is more something I tend to do as part of my release process. There’s a lot of little niceties that nbdev tries hard to make nice for you, to kind of make your code as correct and as, you know, close to best practices as possible, at least kind of our view of best practices.

Done to all best practice

One of the best practices that we think are important is done to all. Done to all is the thing that Python provides for you, where you get to list what are the exported symbols in your module.

Namespace pollution

If you don’t provide done to all, and nearly nobody that’s not an nbdev user provides it, then it exports all the symbols. Not just the symbols which, or anything without a leading underscore, not just the ones that you’ve actually directly typed in as your code, but everything you import also gets exported.

Automatic done to all creation

And that very quickly can lead to namespace pollution.

[00:45:00]

But with an nbdev module, because we automatically create a done to all for you, which includes only the things that you request to be exported, that means that you can see the imports, for example, from fastcore.transform, which is part of an nbdev library. It’s just stuff from fastcore.transform. Or else if you look at something from allenlp.nn.util, you get copy, json, logging, defaultdict, any, you know, this is not stuff created by allenlp.nn.util. And so, because this is built, you know, using the traditional VS Code approach, it really is too much work to manually create done to all.

Manual done to all creation

So the allenlp folks don’t do it, just like pretty much every other Python library. Not all of them. TK, for example, which comes with Python, does define done to all, which is nice, but I don’t know very many non-nbdev projects.

[00:46:03]

Hyperlinking in documentation

So here’s another nice thing with the documentation. In the docs, you can just put your symbols in baptics. And then when you create the docs, which again, it’s automatic, and it can be part of your CI system, in fact, that is by default, you can see it actually creates hyperlinks.

NbDev’s ability to hyperlink symbols

So nbdev knows how to actually look up each of these symbols and hyperlink to them. Even things like this, which are part of, like, different libraries. So this is a really nice feature, which allows you to help out your users, so that they can see exactly what you’re talking about by jumping to other parts of the docs.

Avoiding hyperlinking of parameter names

And of course, some things shouldn’t be hyperlinked, like this is a parameter name, and so those will not end up hyperlinked.

Documentation features

So the documentation, which gets built for you, supports, you know, all the kinds of features you might imagine, a hierarchical menu to take you to any part of the documentation pages, a table of contents for each page, you can have badges, open in Colab, headings, links, all that kind of stuff.

[00:47:08]

So the documentation, you know, comes out pretty nice, I think.

Nbdev build docs command

So here’s what happens, you just run nbdev build docs, and it takes a second or so, it’s all done in parallel.

Parallel documentation building

Or you can have something like a GitHub action, or whatever continuous integration system you use, and call the fast.ai workflows build docs GitHub action.

GitHub action for building docs

So then you can open those docs directly as a notebook.

Index.ipynb and readme.md

And one of them is special, which is the one called index.ipynb. Index.ipynb will automatically be turned into a readme.md for you as well. So no more worrying about trying to keep your files synchronized to make sure that your home page and your readme are saying the same thing.

Synchronized information across platforms

Now we actually do that for you automatically.

[00:48:03]

We also of course make sure that it’s not only the notebook, but the home page on your documentation website, and even your PyPy and Conda descriptions will all end up showing you the same information from your index notebook.

Creating user-friendly experiences

So in this way, because we’re just saying build stuff in one place, do it once, and then we’ll make sure that everything syncs up for you, that makes it trivially easy to create really nice user experiences for your users.

Using nbdev for small projects

So for me, even when I create tiny simple little projects, I always do them in nbdev, because that way I know that I can, you know, in a minute or two, provide installable libraries and documentation just in case anybody else is interested in using my work.

FastWebhook example

And often I find, you know, even for stuff that I think is pretty niche, there’s always a few people who are interested in using it too. Here’s an example actually. FastWebhook, which I mentioned before, it’s really just written for myself.

[00:49:02]

Creating a webhook for fast.ai

For fast.ai, I wanted a webhook that would send out a tweet anytime there was a release, but I did it. I wrote it in like two hours, I guess, and then I just hit make release, and because I made it from nbdev template, it automatically created the conda package and the pypy package for me, and everything was all set up, which is really nice.

Version control challenges with notebooks

One of the challenges with working with notebooks on version control is you can get some really ugly diffs that won’t even load in notebooks.

Notebook level diffs

nbdev will actually ensure that those diffs are turned into what I would call a notebook level diff, which is to say it always ensures that your notebooks can be opened.

Ignoring cell output differences

If there’s a difference only in cell outputs, it just ignores them and just picks one because, you know, you can just rerun it. If there’s actually a difference between, you know, in a cell, two people have changed the same cell, then it’ll actually show you the diff tags in a notebook.

[00:50:03]

Opening diffs in Jupyter

You can open it up in Jupyter and fix it up.

Parallel test execution

All of your tests run in parallel with nbdev testnbs, using as many cores as you have. So this is a great way to ensure that every notebook runs from the top to the bottom and has the actual outputs that you’re expecting.

Math equation support

Lots of nice little pieces like math equation support. All the LaTeX equations work nicely. You use it in your markdown and it pops up both in your notebooks and in your documentation.

Catech library for math equations

We’re using Catech, which is a really nice fast library for that.

Fast pages blogging system

And there are other things that we power as well, not just publishing libraries, but nbdev also powers fast pages, which is an increasingly popular blogging system where you can write Jupyter notebooks and it turns it into a blog.

[00:51:03]

Writing technical content in notebooks

And this is really nice for anybody who is often trying to communicate technical content involving equations and or code visualizations.

Avoiding copying and pasting

No more copying and pasting gists into medium or copying and pasting, you know, outputs, you know, plot outputs into files. When you can do the whole thing in a notebook, there’s nothing to think about.

Simplicity of using notebooks

It just works, which makes life very easy.

Fastdoc for creating books

And as we discussed earlier, Fastdoc takes notebooks and turns them into publication quality books.

Encouraging users to try Jupyter Notebooks

So I hope that you might give it a go and see why I like Jupyter notebooks.

Nbdev.fast.ai website

You can just go to nbdev.fast.ai, which is, of course, a nbdev-powered documentation site.

Creating an nbdev repo

And you can just click a button and it will create your nbdev repo for you, and you can get started.

[00:52:01]

Conclusion and call to action

Thanks so much for watching, and I hope that you try this out and find that you like Jupyter notebooks too.

End of video

Thanks.