JUMP CUT
A REVIEW OF CONTEMPORARY MEDIA

BUTLER: Since you brought up the Wayback Machine, let me ask you a couple of questions about that attempt to preserve all of the Web, because I think that was probably how I was first introduced to the Internet Archive. This is your 25th anniversary and so it's 1996 that you started.

This is my favorite quotation of all time about the impermanence of digital data [illustrating the need for Web preservation]. It's by Penn Jillette, who, as you may remember, used to write a column in PC Computing... the final page of PC Computing.

And so in 1992 he wrote a column where he was advising the woman from The Howard Stern Show, Robin Quivers, about buying a new computer. 1992. So it's pretty early in personal computer purchasing days. She said she was worried about losing things she was writing. Jillette writes, "I told her that with proper backup it was safer than paper, which is true if you're scrawling on tissue on the edge of the Grand Canyon in gale winds without a paperweight and the mighty Colorado is on fire." That's my all time favorite computer quote.

KAHLE: Let me just look that up, if I could. I don't know if we had PC World.

BUTLER: It was PC Computing, and you have a very limited run of PC Computing on the Internet Archive. That's where I looked first. But then I found a link on the Wayback Machine to a really old Sin City website that had collected the text of all of Penn Jillette essays.

The Wayback Machine

KAHLE: Oh, nice.

BUTLER: This is where I finally tracked it down. It was at SinCity.com, which is long gone. As you can see from the capture timeline thingamajigger. What do you call that little...?

Wayback machine Sparkline

KAHLE: The spark line.

BUTLER: As you can see it, this Website died somewhere around here 2003. The Wayback Machine is where I found Jillette online.

KAHLE: Well, put it in your article.

BUTLER: Oh, yeah. Without the Wayback Machine I would have been lost. Then I went on eBay just this morning actually and bought a physical copy of that PC Computing issue so I could have that article [see below or read the PDF].

KAHLE: Fantastic!

BUTLER: The same year, well, two years before the launch of the Internet Archive, I built my first website, which was for film and TV studies, the area I'm in, and I called it ScreenSite.org. So here it is, in all its 1996 glory.

ScreenSite.org screenshot.

KAHLE: Nice looking.

BUTLER: I was so excited to be able to have actual images on the site. I recently redid ScreenSite. It still exists. I redid it one more time before I retired and my backup of this 1996 version was on floppy disks that I could no longer access. So I went to the Wayback Machine

KAHLE: How'd we do?

BUTLER: You're looking at the product from the Wayback Machine right now. So this whole thing is Wayback-Machine-generated. From July 1st, 1996.

One of the questions I had for you about the Wayback Machine is a little more on the technical side. In '96... a few JPEGs, all text, it was pretty simple to capture a Web page, but now you've got all this JavaScript, you've got CSS, you've got all this kind of stuff going on. It's obviously becoming more difficult to capture a Web page. Is it becoming impossible?

KAHLE: There are certainly things that are technically difficult and then there are paywalls. We have an ever growing number of engineers trying to figure out how to do rich media sites and JavaScript and Ajax. I'd say a paywalls... I love the line from [Nathan J. Robinson, Current Affairs], "The truth is paywalled, but the lies are free."

BUTLER: That all leads me to another question I have which is a little more historical one, since your career, your involvement with online stuff started back with some sort of Gopher-type thing called WAIS. [Gopher was online information software developed by the University of Minnesota before protocols for the World Wide Web came to dominate.]

KAHLE: WAIS is "wide area information servers." Before the web.

BUTLER: Exactly. So before the Web, in the dial-up days, the BBS days, where we had these commercial silos: America Online (AOL), GEnie, CompuServe. So we moved from that to the freely available Internet and now it seems like we're just going back into the freaking silos again.

KAHLE: Oh, yeah, yeah, yeah. No, you're absolutely right. Mainframes. What is AWS [Adobe Web Services] other than a mainframe? Now at a global scale. The walled gardens, the promises of safety and security, it's all reminiscent of the battles with AOL and CompuServe and Prodigy. And that everything's going to be mediated on somebody else's platform is sort of reminiscent of LexisNexis or Dialog [now owned by ProQuest]. Our mantra in the early 90s was "Everyone's a publisher." And that enabling aspect, whether it was from 'zines or early Websites or WordPress sites, LiveJournal... these things where you can even host your own server, like WordPress. That was the dream. And Google was part of that era of trying to make it so that openness works and so were we.

My whole career has really been about trying to get the open world to work. I want a game with many winners. I want an environment where people can be their most. They can be at their best. That they share in an enduring way. That the good works of Penn Jillette, or Jeremy Butler from 1996, will find their rightful audience or their justified obscurity. Maybe it's just your great grandkids are going to be looking up your ratings from 1996 but they should be there. That they have a place in the library. It's not just those with New York book contracts. It's not just those that are famous professors. We have histories that we can relate. Everyone has something to teach. If we can build technologies that go and put people on their best. And support their best. Then we're in great shape as a society. If we make technologies that make them just yell and shout and curse. Storm around, demand whatever, we're making technologies that don't serve us well.

BUTLER: It's a lot like Julia says, "Don't be stingy," to just let stuff out there. And that was my goal when I founded the ScreenSite. And I also founded the first film and TV studies LISTSERV (Screen-L) and those sorts of things.

KAHLE: I think Caralee [a writer who works for the Internet Archive] may have talked to you about doing a blog post.

BUTLER: We talked last Monday

KAHLE: I changed the Internet Archive Blog post draft. I just changed the title and then put in a new first paragraph. See if this works for you: "Jump Cut journal is a model open journal by hosting on archive.org and now digitized from microfilm." And that's the title. And then: "Jump Cut is the model of open access journals. When the Internet Archive digitized older issues of Jump Cut from microfilm, we found that it had already been posted in textual form by the publisher. When we reached out to see if we could open up the microfilm version for free public access and download they were enthusiastic."

BUTLER: That's right.

KAHLE: "Here we wanted to share some more background on Jump Cut and why openness is important for them."

BUTLER: That's absolutely right.

KAHLE: It's just the only way to work on the Internet. It's the only way to make it work. But there are organizations that are really not going that way.

BUTLER: I know you also work with the Electronic Frontier Foundation, of which I have a bumper sticker on my car right now. Longtime supporter. I don't want to take up too much more of your time, but I have two final questions that may be too big for this discussion. But let me launch them at you and you can tell me how much you want to go into him.

The first would be whether you had any thoughts about ongoing consequences of the usage spike that the Internet Archive had due to the COVID-19 pandemic. People are talking a lot about not having academic conferences anymore and just doing everything on Zoom and things like that. Is it too early to be able to tell whether that spike is going to continue for Internet Archive?

KAHLE: We're all home schoolers now, right? We're all adjusting and have had to jerk into a new world and some of that world is better. There was a lot of just flying around, going to these random conferences that you're expected to go to, and we don't need to do that as much. I'm trying to put myself on a flight diet. Can I just fly 12 segments a year? I don't know what I was doing before, but it was a lot higher. So let's make it count.

BUTLER: Yeah, that makes sense.

KAHLE: I think it was just gorging on travel. Decreasing commuting would be great.

But what's the right blend? Heck, if I know. The Internet Archive is gone remote first, which means that the assumption is remote.

BUTLER: Oh, you mean in terms of your employees?

KAHLE: Yeah.

BUTLER: But how does that work for the scanning project, because they need to physically...

KAHLE: Yeah, the scanning is physically places, but they're all over the place. We have 20 scanning centers in all sorts of libraries.

BUTLER: I'm constantly impressed with the quality of the scans that we're able to get now.

KAHLE: Well, thank you.

BUTLER: And it makes me wonder how you feel about those older scans. You still have the physical book, are you ever going to go back and scan or OCR [optical character recognition]?

KAHLE: We do. We reprocess and we try not to rescan if we can avoid it. But depending on costs and availability. Hopefully it'll become easier to scan again. That's why we physically own these things, this may not be the last time this is done.

BUTLER: Right. And you look at something like the text on this page [from the Jewelers' Circular and Horological Review, 1893], so itty bitty. OCR would have just thrown up all over it 20 years ago.

Jewelers' Circular

KAHLE: But now it's gotten really good.

BUTLER: It's astonishing.

KAHLE: It is astonishing. And the open source OCR is now really good based on machine learning and actually supported by Google. They supported the open-source OCR. That's really great.

BUTLER: The last big question I have for you, and like I say, feel free to comment as little or as much as you want about this is, the Digital Millennium Copyright Act [DMCA] was enacted two years after you guys started. Did it have an immediate impact on the Internet Archive in '98? And from the perspective of 23 years later, is it good or is it evil, the DMCA?

KAHLE: Ben Franklin-era copyright was 14 years renewable once and derivative works "okay!" By the time we got to 1998, it had stretched and stretched and stretched to life plus 50. It just goes on for a hundred years. There's no [copyright] registration. It's just... It got mauled. And so the "notice and takedown" approach has made the Internet that we have today. [Editor's note: The DMCA provides a "safe harbor" for sites like the Internet Archive, allowing them to present works of unknown copyright status; if a copyright holder objects to a specific item on a site, it may request it be taken down.] That and CDA 230 [Section 230 of the Communications Decency Act] those are those are foundational laws for having an open Internet, otherwise, it could have been a lot more like cable television, or controlled environments like Nintendo. It's just, it's only the things that are on the store shelf.

And I grew up with that. And there is no way of getting your words seen. Maybe I was in a group picture of our soccer team in high school in the town newspaper. But that was it. And maybe they spelled my name right in the caption. But that was my only access towards getting my words out. We're so much further than that and actually it's a large part because of some early leadership in the United States by the government to try to live in open world. Does it mean that it's all perfect? No. Have people abused it? Yes. Should we go back to the world that I grew up in? Hell, no!

BUTLER: I guess when I ask if it's evil is that it does seem to have had some chilling effect on fair use and things like that... the DMCA.

KAHLE: We're trying. I think we just we have to keep some of the very lucrative lobbying opportunities under control. There was a study recently done and I don't think is public yet, that the content industry has spent over a billion dollars in the last 10 years on lobbying Congress. Where is the public interest in that? I think some of those early laws, if they were to try to be done today, would be just lobbied out of existence.

BUTLER: I guess for the Internet Archive, the "safe harbor" provision is useful perhaps.

KAHLE: It's very useful. I was unaware of all of that in those early days, but other people were. We just try to be a library. What does a library do? It buys things. It gets donations of things. It preserves that. And it lends them out. That's what we do! Until somebody tells me that there's not going to be libraries in the future in the digital world, we're going to continue doing it. And there are people that are arguing that there shouldn't be libraries in the digital world. And I disagree. Let's make sure libraries thrive, that they're supported by the community, that they are supporting creative industries. It is the biggest form of public funding of publishing there is.

BUTLER: That's right.

KAHLE: Let's not lose that.

BUTLER: There are legions of fans, I think, like me out there that appreciate the work you've done. It's really astonishing to me.

KAHLE: Thank you for that. We live for comments like that. There is no gold at the end of the rainbow in a nonprofit.

As we shut this down... You have a video of this, right? It's being recorded.

BUTLER: Yeah.

KAHLE: Would you would you mind posting it to the Archive?

BUTLER: Sure. I'm surprised you would want it, but I'd be happy to do so.