Date

TLDR My new Pandas book, Effective Pandas, is released ebook here, physical book to follow soon.

You Wrote Another Pandas Book?

Yes, I wrote my third (or fourth, if you count Machine Learning Pocket Reference) Pandas book. In 2016, I released Learning the Pandas Library, and in 2020 I released the second edition of the Pandas Cookbook. Why would I do that?

I think my books fill a need. Often that need is to write the book I would have wanted when I was learning said technology. Learning the Pandas Library is almost six years old and one of the first books on the subject. While not outdated, I have taught and used Pandas extensively since then, and have developed strong opinions on the correct way to use it.

About two years ago, I had most of the book updated for a second edition. It had new information, cleaned up some areas I wasn't happy with, and added assignments. That book sat in Git and never saw the light of day. Early this year, I planned on releasing it. Instead of releasing it, I re-wrote it. And created over 100 images. (Writing software to generate and layout many of them.)

The result is an highly-opinionated book that teaches best practices for working with columns, manipulating tables, summarization through grouping, pivoting, and cross-tabulation, plotting, and focusing on writing clean code that your future self (and colleagues) will appreciate and understand.

I was asked a few questions in the process and thought I would answer them here.

What does Highly-Opinionated Mean?

I named the book Effective Pandas because a lot of material out there (on a certain blogging platform particularly) claims to teach Pandas. However, much of the material is not effective. It is confusing, bad advice, or often just plain wrong.

I am a corporate trainer and have taught Pandas to thousands via small group trainings, larger virtual workshops, and a course I teach for Stanford continuing education. I see a lot of my students confused or coming up with bad solutions because they are adopting the (poor) advice easily found on the internet.

My conference "tour" this year was mostly focused on some ideas that can make your Pandas code better. Many of these talks are kicking around the internet. Here is one. My book summarizes these thoughts and provides many real-world examples.

How is it Different than the Pandas 1.x Cookbook?

The original Pandas Cookbook was written by Ted Petrou. I read a lot of Pandas content, and I read this book. I thought it was a good book. I was approached to "author" the second edition. I added a few chapters, re-wrote much of the code (using some of the ideas found in my new book), edited some parts, and released the book. In the end, this is not "my book", it is "my take" on Ted's content. Ted has some really cool content that is worthwhile. It is a great book, however, it is not the Pandas book that I wanted.

Do You Have a Chapter on Extending Pandas?

Nope, I don't. I have never had that need, nor has a client of mine. If I had unlimited time, I would include more material.

I have heard that this is difficult, and there is not much material covering it. I can commiserate with this. I spent a lot of time on the time and date sections of the book. And while there is some material floating around the internet, best practices are hard to come by, which can be frustrating.

I also went down a rabbit hole of exploring all the combinations of grouping, pivoting, resampling, cross-tabulating with functions that return scalars, series, and dataframes. In the end, it was an interesting thought experiment. However, I don't think that exhaustive treatment was appropriate for the book (even though it is not available anywhere, except for some hand-written notes on my desk).

Do You Have a Chapter on Using Pandas with Django?

No. See above.

Why Self-publishing?

Self-publishing this book lines up with my goals of corporate training. I want to be able to easily include this with training material. I'm positive I could have multiple contracts for this book if I wanted to.

Isn't Self-publishing a Pain?

I've worked with two different publishers and self-published many books. There are pros and cons to each approach. In fact, I get asked this question so many times that I created a course, Effective Authoring, about book creation. It discusses these topics and interviews over a dozen technical authors to get their take on this and other questions. I'll answer it this way: it is a different pain than working with a publisher and you need to be able to manage some of the aspects that a publisher does for you.

What is Your Process for Writing a Book?

Here is a simple outline:

  • Seriously contemplate whether you want to use your time writing a book.
  • Outline the book.
  • Research (or use experience) to write code that demonstrates the outline.
  • Write text to explain the code.
  • Create imagery to reinforce concepts.
  • Flush out details.
  • Get feedback (my tech reviewers were great!).
  • Release.

Again, my authoring course goes into detail about book creation. I use Emacs. I write in a (sub/super)set of reStructuredText that allows me to create PDFs and ebooks from the same content. I leverage reviewers and automation (Grammarly) to clean up and clarify my content.

How Much Time Per Chapter?

This is one of those trick questions. It doesn't take into account that I have used tools for a long time, and thing that are difficult for beginners become second nature when you do it often. Some of the chapters have content I have taught so much that I can write it from scratch with little effort. Other chapters took multiple days to review, test, and summarize.

I could crank out the text for most chapters in about a day (the book has 35 chapters). I wasn't working on this full time, but would work between trainings and during downtime. There are a bunch of images that are time-consuming to create. (I like to batch those as it seems more efficient.)

One of the images from the book. Creating content like this takes a log time, especially if you make almost 100 of them.

Then you have to do edits. Tweak code, tweak examples, clarify. It takes a bit of time. There is a lot of bouncing around, which is necessary but the context switches slow down the process.

I Use Google to Search for Pandas Recipes, Will this Help? Is it Better than the Pandas Docs?

It is hard to answer the will it help question. Not all content pleases everyone. Putting your creations online can be nerve-wracking, and reading your Amazon review can make you want to go live off the grid and never show your face again. I've come to accept that you cannot write a book that satisfies everyone's needs.

Many of the "recipes" floating around are bad advice or wrong. If you read this book, it might help you determine if the advice you find is actually good.

Also, I spent a lot of time creating the index for this book (I feel this is one of the most important features in the physical book). The index is fifteen pages long. (The index for Learning the Pandas Library was five pages long as is the index for Pandas 1.x Cookbook.) Some people prefer to use analog indices to digital ones.

Regarding the Pandas docs question. There is some good content in the Pandas documentation. But the official documentation has different goals. It needs to be a reference for every function and method. I don't need to be a reference, in fact, I skip (on purpose) some of the functionality because I strongly believe that your life will be better if you didn't know it existed.

Also, the Pandas docs often demonstrates functionality with random data. For me, I understand the concepts better with real data. My book is full of interesting (to me at least) data, not just random numbers.

Finally, the Pandas docs usually show single method calls. It does not show end to end analysis, or what code in the wild might look like. Again different goals. I'm trying to help practitioners write better code, not document everything.

As I said earlier, if days were 50 hours long, perhaps I could include the whole kitchen sink in the book. However, my audience is not someone who wants to memorize or learn about every single option found in Pandas. My audience is someone who wants to write code to manipulate data but also wants to revisit their code in a month and clearly understand what it does.

Do you Still Like Pandas?

In addition to corporate training, I also consult around Python and Data Science. I use Pandas quite a bit. I was initially attracted to it because I had written a similar library back in 2006 for business intelligence reporting. Since then, Pandas (with all of its warts and inconsistencies) has taken the world by storm (for better or worse).

I consider knowing Pandas an essential skill for anyone working with structured data (in Python). It has its problems, and I don't shy away from them in this book. At this point, I think Pandas the API is more important than Pandas the library. What does that even mean? Well, there are more than a dozen libraries that implement the Pandas API. The API has become the defacto Python interface for data (somewhat like SQL). Many SQL people would tell you that not all SQL is the same, and it is possible to write clean SQL and poor SQL. I feel similar about Pandas.

Do I Submit Issues?

I have submitted issues to Pandas and did so for this book. I found a regression during the Cookbook, but that didn't happen during this book.

At the End, Was it Worthwhile to Write a Book?

The first step of writing a book is serious contemplation about the value of your time. My first book (what is now Illustrated Guide to Python 3) needed to get out of me. However, at that time, I was employed.

At this point, I think I'm a little more strategic. I run a training and consulting shop. I consider myself an educator and really enjoy corporate training. As much as I have tried to "crack the nut" for selling corporate training, I haven't found a repeatable sales process. Most of my clients hire me after reading a book or listening to a conference talk.

I realize that there are "training shops" that offer hundreds of different trainings. If I was buying training Python or Data Science, I would be wary of them. I wouldn't want to have a professional slide reader training my team. I would rather have an expert that can discuss the pros and cons and knows what the warts are because they have witnessed them first-hand.

My books are my sales and marketing team. It has been said to invest a good deal of time into marketing, and I believe that has been the case with this book.

If you are considering writing a book, you should think about the best and worst outcomes. Most of the reporting from book sales you hear about is probably from the top end, so don't be disappointed if you don't achieve similar results. Writing a book in and of itself is a great way to master the material in the book. It can also be a nice introduction, "I wrote the book on that."

Checkout Effective Pandas here, physical book to follow soon.