The Curse of a Post-Storage-Scarcity Mental Economy

by John Holbo on July 24, 2015

I got a new iMac. Awesome! Until last week I was using my old iMac, from 2009. I buy a good one so it will last. It’s amazing how one day in front of the retina display makes me think ‘yuck!’ looking at my tired old, burnt out 2009 display. But onward, to the future!

As every fool knows, everyone has storage to spare these days, but extra space is a moral hazard. For the past two days I’ve been transferring stuff to the new iMac from my old one(s) – I’ve got a MacBook as well. I’m not an organized person, by nature. I save stuff in stupid places, nested in folders within folders, with the wrong names on the files and folders alike. So for two days I’ve been trying to clean by hand. I didn’t use Setup Assistant to pour the messy old wine into the big new skin. I’m a digital hoarder. Why delete when I can just save in some new folder that I label ‘old stuff’? My old hard disc was this apartment crammed with the data equivalent of old jars of urine and nail clippings. When I digitally exfoliate, I send all that dead skin wafting up to the Cloud. When I intellectually excrete, I don’t delete. I need a seriously better system for storing casks of cyberurine just in case I want it again.

When I buy my next Mac in 5 years, if the world hasn’t ended, how much more datajunk will I have? Terabytes and terabytes. I have folders of PDFs, not to mention media, not to mention multimedia projects I monkey around with. At some point, I am going to get to the point where my hoarding instincts overwhelm my monkey brain capacity to sort through. I’m going to be the world’s worst cross between a monkey and a creature that’s outgrown its old shell. I’ve got to start organizing better as I go, not just letting free and easy ‘I can name it anything and save it anywhere!’ rule my life.

I need a beneficent nudge from a suitable productivity tool/framework.

I am promising myself I’m going to use Adobe Bridge this time, to organize all the photos and mark them up, sort of, as I go. And I am going to do something about all those damned PDF’s I’ve saved all over the place. Often with absurdly random alphanumeric filenames. I just left so many of them in the damn Downloads folder, like I was going to go back there later and open random PDF’s with random names? (Ah, the Download folder, what a graveyard for white elephants’ worth of data!)

So, couple thoughts. You want two backups, duh. Set up Time Machine and sync everything to Dropbox. That way you are always working with the same files, across machines. And you are safe. Because if Dropbox fails you’ve got Time Machine, and if all our on-site machines fail, you’ve got Dropbox? I’ve got a Terabyte of Dropbox space. Is there any reason not to be basically syncing that much of my stuff all the time? That is, I basically largely live inside my Dropbox folder? (Except for my iTunes folder itself, which I think wouldn’t play well with that?) Is that dumb for any reason?

And you scholars out there: how do you organize stacks and stacks of PDF’s, for good surveyability and access. MS-Word. Stuff. I’m thinking the rigorous way to do it would maybe be via Evernote. Maybe every time I download a scholarly PDF, I attach it to an Evernote entry, containing some metadata. Is there an easier way to keep it all organized. Obviously remembering to name your files in sensible ways and save them in sensible places, and keep up that discipline, is a good idea. I’ve never used Smart Folders or the tagging options that Mac has allowed (since Maverick, or so.) Is that stuff good to study and use and practice? Some simple tool that works a treat? Some habit, so I don’t go insane in 2020 when I buy a new iMac (which will probably be inserted in my ear, or behind my eyeball or – you know – wherever the future holds data.)

 

{ 42 comments }

1

david 07.24.15 at 5:09 am

I’d avoid complex solutions like Adobe Bridge. Aside from the difficulty of keeping the tagging consistent, Adobe may lose interest a couple years later, and then you’ll have a problem.

My own solution is to put stuff in folders by academic year, with a single subfolder for topic. Ongoing multi-year folders get moved to the next year when it starts. Then I suppress the urge to make the folder trees too complicated, and bank entirely on search to dig up archived stuff.

PDFs and other documentary detritus: for organized projects, Zotero. For disorganized PDF hoarding, I name them to (author) (year) – (paper name).pdf and then just stuff them in a year’s subfolder under /reading or /interesting or such. If they’re not OCR’d, I OCR them with Acrobat before I forget they exist. To dig them up again: search.

Emails: Gmail.

To prevent a generic /downloads folder from ballooning, I find it useful to configure a browser to ask where to save files each time. You also can rename them at this stage.

The problem with more complex tagging schemes is that their value drops rapidly if the tagging is inconsistent. Which is its default state, because tagging is tedious and there are other things to do. So it’s much better to have a document retrieval format which is robust to inconsistent organization (local search; lots out there) than it is to have a complicated retrieval format. Zotero is only tolerable because it has plugins for virtually all journal websites that I use.

My own problem is organizing interesting bookmarks/blog posts. I haven’t found a solution that will easily archive a copy for full text search.

2

AnonymousThisTime 07.24.15 at 5:11 am

Take a look at DevonThink and Tinderbox, for organizing your files.

And your Dropbox/Time Machine strategy is very sound-an onsite and an offsite. Time Machine is both an onsite backup, and a history of all your past documents/apps. You can go back to any arbitrary point in time and see what was on your machine (until your backup hard drive fills up and then the oldest files are deleted, to make room for the new ones).

Some people do report Time Machine file corruption, and for that reason prefer ChronoSync. Review here: http://www.macworld.com/article/2011101/review-chronosync-4-3-5-a-multitalented-file-sync-and-backup-tool.html.

Lastly, some folks like to have a back up that’s immediately bootable, which Time Machine is not. For that reason, they have a second local backup drive, using Carbon Copy Cloner or SuperDuper.

3

The Raven 07.24.15 at 5:20 am

I find Zotero useful for tracking scholarly papers and the like.

4

Matt 07.24.15 at 5:41 am

The way I do it is to buy a new pair of external hard drives every few years when capacities increase, and periodically back up everything to them with rsync. When I run low on space or one of the drives dies, time to buy a new pair. My current pair is 3 TB units and I started when they were 80 GB. I had an “unlimited” CrashPlan account that got disabled after several days of continuous uploading, following my attempted sync of one of the backup drives. Drives are too cheap and life is too short for me to manually de-duplicate it; I’m hoping that at some point someone will invent really great all-purpose deduplication software that can keep the best-annotated and highest resolution version of the many copies of photos and videos I have from family holidays. At that point I’ll be able to discard a lot.

If you find a good way to index/search large quantities of PDF files on a desktop machine, I would love to hear about it. I have hundreds of gigabytes of PDF, nearly a million files. Everything I have ever tried has choked. It looks like I might be able to do it if I install Apache Solr with the right additions, but that also looks like a lot of work. Right now I have to use the original publisher’s site to search a journal and then look up the file locally; it’s less than optimal when I am doing a topical search that crosses journals/publishers.

5

david 07.24.15 at 5:50 am

A million? What on earth is your line of work, out of morbid curiosity?

6

John Quiggin 07.24.15 at 6:41 am

I also use Time Machine + Dropbox

For managing files, I use Papers 3 from Mekentos. It can be a bit flaky, but when it’s working well, it’s superb.

A million files seems like an awful lot. I keep multiple versions of everything I create (haven’t moved on to Mac OS versioning yet), and never delete anything except junk/mass emails, and I still have only 90k documents in my Documents folder, and maybe 200k emails. Over 30 years since I got my first Mac, that works out to 10 docs created per day, and 20 odd emails stored. Then there’s my blog with 170k comments.

7

Z 07.24.15 at 7:00 am

So, couple thoughts. You want two backups, duh.

Take a look at […] Tinderbox, for organizing your files.

Now someone must make a joke.

8

dax 07.24.15 at 7:37 am

“Is that dumb for any reason?” Privacy concerns would be my reason.

I use the system mentioned in the post: every time I upgrade, I simply copy all the files of the old machine to the new machine in one big folder, called “old machine”. I’ve done this 4 or 5 times, so there’s a folder in “old machine” called “old machine 2”, which contains a folder called “old machine 3”. Then I *copy* (rather than move) any files which are “live” in any way to a new file in the new machine (with the same name as the old file in the old machine), and get on with my life.

The exception to this rule are music and photo files, which are kept on external portable drives.

I use Time Machine and one other hard drive to back everything up (so I have two back-ups on site). Live files are backed up on another computer (so that makes 3).

9

maidhc 07.24.15 at 7:54 am

I avoid Dropbox since Condoleeza Rice joined their Board of Directors.

I also avoid cloud storage since the Megaupload affair demonstrated how risky it can be. At the whim of some police force in some country on the other side of the world, suddenly your files are gone.

I equip my computers with a second hard drive that I store all my data on. When I get a new computer, I just move the drive over. When the drive gets more than half full, I get a new one (guaranteed to be larger, of course) and copy the contents of the old one to it. Right now I’m getting along with two 1 TB drives per system, but it’s possible to go bigger.

I do backups to external hard drives. I think if you get enough of them, they reproduce on their own.

I admit it doesn’t help that much with the indexing problem, it just postpones it. I have disk images that are from 4 or 5 computers ago.

10

Rj 07.24.15 at 8:40 am

You could just let the AI sort everything when you get your next Mac… on your proposed timetable, it is probably a more realistic option than you care to consider.

11

Mike Beggs 07.24.15 at 8:56 am

You should check out Sente for pdf management. I don’t know how it would handle _a million_ docs but it has never given me any trouble with thousands. It’s good for storing, searching, tagging and annotating. It keeps everything in the cloud so you can access your pdfs on anything. If your pdfs are OCRd it is awesome being able to search the whole library.

It does annotations really well, but the one thing it’s not great for is outline-style notes. I use OmniOutliner or just text files for that, and keep everything in Dropbox. I tried DevonThink for a long time, and it is great if you’re using just one computer. But I got so frustrated with its syncing. That never worked properly for me. Dropbox syncs effortlessly and normal OS X search seems to work just fine.

12

ZM 07.24.15 at 8:59 am

“And you scholars out there: how do you organize stacks and stacks of PDF’s, for good surveyability and access.”

I have been slowly trying to organise my computer properly this year as I need a new one as this one is too slow and I can’t download some programs I need to practice using.

Last year a more organised young woman in one of my group projects recommended using Mendeley for referencing and organising PDFs. I think maybe some other programs are also good for referencing, but Mendeley seems to be the best for automatically indexing downloaded PDFs.

When you put your PDFs into files, put them into Watch Files and Mendeley will automatically extract the details to index them when you open the program. It would maybe still take a while to organise your old PDFs into the right folders though, unless you could just put them all in one PDF Watch Folder and then access them through Mendeley, but this could be a problem if the program can’t extract all the right data from the PDFs.

“Mendeley Desktop includes a feature called Watched Folders which will automatically import all PDF files saved in your chosen folders from your hard drive to the program when you open Mendeley Desktop.

When you run Mendeley Desktop, any new PDFs added to these folders will be automatically imported to your All Documents folder in your personal library in Mendeley Desktop. You can then add them to groups, edit them, or add them to folders in your personal library manually. Enabling Watched Folders should only import each PDF file from the folder once. ”

http://support.mendeley.com/customer/portal/articles/989571-how-to-use-the-watched-folders-feature

13

Anders Widebrant 07.24.15 at 12:21 pm

For documents, I’ve found it best to just try to squeeze as much relevant information into the file name as possible, without worrying too much about keeping to a format. In particular, it seems to have proved useful to include a few words that I associate with the actual content of the file, even to the point of making the name kind of absurdly long.

Then I just chuck everything into one directory and grep/file search for things when I need them. Works okay, most of the time. The weak link is obviously actually taking the time out to actively name documents as you store them.

14

Peter T 07.24.15 at 12:31 pm

Hire a librarian

15

Barry 07.24.15 at 12:34 pm

Now this is a thread I’ll keep up on!

I have the same problem – I’ve got a lot of PDF’s, and use three machines to look for things (iPad, work computer, home computer).

A friend persuaded me to use Zotero, which I’ve started with.

Does anybody know the relationship between Zotero and Evernote? On my iPad when I try to get the first, I’m referred to the second.

I’ve also been using Carbonite and an external hard drive for dual back-up.

16

afeman 07.24.15 at 1:30 pm

I once looked at my working directories saved from a previous machine for the first time in five or so years. Among them was an ASCII file called todo.txt. The amount of overlap with the current version was distressing.

17

Belle Waring 07.24.15 at 2:27 pm

My solution is to make John figure it out.

18

bluefoot 07.24.15 at 2:44 pm

I’ve heard good things about Mendeley from a couple of people. However, for my PDFs (and I have many, being in science and all), I’ve long had the habit of naming them first author (or lab)_journal_year_topic, then put them in folders depending on what project or area they’re attached to, except the ones that go into the “read now” folder and the “cool stuff” folder (random fun stuff like the population ecology of vampires, or analysis of meritocracy in large organizations).

I do backups to an external hard drive. And about once a year or so, I back up to a second one that lives in a safe deposit box at the bank.

19

MDH 07.24.15 at 2:54 pm

For PDF and book management, but mostly the former, I use “Papers” for mac. Every downloaded PDF gets put into a folder that I then import into the program. You can import multiple ways and it’s getting pretty good at extracting metadata and automatically populating the bibliographic information. Manual options to fill in the rest. It also automatically assigns a filename (I use lastname_year) and sorts them into new folders by author. Has companion iOS apps that I don’t use. It also has a widget w/ CWYW capability that I also don’t use. I just periodically export a bibtex library and, while writing in Latex, have Papers open and click and drag the filename from the Papers DB to the point of citation in TexShop and it copies and pastes the latex-formatted citation style. Easy peasy.

20

MPAVictoria 07.24.15 at 2:59 pm

My solution has been to give up. Seems to be working.

21

Doug K 07.24.15 at 3:51 pm

DocFetcher will index most everything, including PDFs. It uses less than 1G to index 200G of assorted emails, PDFs, saved web pages, etc etc. It’s written in Java and uses the open-source Apache Lucene, so I can in a pinch maintain and extend it myself.
I used to have Copernic Desktop Search but the freeware version was crippled by release 5 so upgraded to DocFetcher. It has its quirks but is generally reliable. This way I don’t need to organize anything, just pile it in a heap and search when needed.

Backups go to a local mirrored NAS, swap out the physical drives from time to time and keep one at each workplace.
There is no cloud, there is only other people’s computers. As Maciej CegÅ‚owski says,
“What the cloud is, is a big collection of buildings and computers that we actually know very little about, run by a large American company notorious for being pretty terrible to its workers. Who knows what angry sysadmin lurks inside the cloud?”
Not good for backups.

22

Teachable Mo' 07.24.15 at 5:02 pm

If it’s important, get it on paper. If it’s not important enough for paper, it’s not important.

We don’t live long enough to revisit whims.

23

Matt 07.24.15 at 5:19 pm

Ah, DocFetcher looks like an interesting possibility. Lucene should be be scalable enough for the back-end. I tried using Recoll before but the indexing process seemed to slow down dramatically over time so it was never going to cover all the files. It would have worked for 10,000, maybe 50,000, no way 995,000.

I have all these files for reference and research in chemistry, its history, and allied fields. It’s a hobby-continuation of graduate school. My actual job does not involve nearly so many documents.

24

elaine 07.24.15 at 5:21 pm

I will throw in another vote for Papers for PDF management: http://www.papersapp.com

Automatically pulls metadata, automatically labels and files your PDFs, can sync with iPad/tablet, you can annotate/comment/highlight PDFs, export to Zotero, etc. I know that Zotero can do both citation/bibliography management and PDF storage, but I’ve been using Papers for years and wouldn’t give it up due to iPad syncing and other features.

25

James Schmidt 07.24.15 at 5:57 pm

For years I’ve been using Zotero for managing PDFs. Its chief virtues (aside from the fact that it works) are that it has a rather large and active community supporting it (along with institutional support) and it is likely to stick around. There are also some handy supporting apps (e.g., Zotfile, which sends pdfs to my iPad for reading and annotating). As a way of supporting the project I subscribe to their backup service (unlimited storage for $120 a year, 6 GB for $60 a year, 2 GB for $20 a year), which gives me an offsite backup of my pdfs.

I back up my iMac with TimeMachine, chiefly as a way of retrieving stuff that I shouldn’t have deleted, and make two clones of it using SuperDuper! (a really solid backup program — I can’t say enough good things about it), which I rotate between my home and my office. I also picked up a 64 GB Kingston DataTraveler flash drive that I keep on my keychain: it has all my writing, note files, and other things I want to have if meteors hit my house and my office at the same time. I also have two 64 gig flash drives rotating back and forth to my office — one of them is always plugged into my iMac and is constantly updating using backup program called Synk (Decimus Software). My university gives me unlimited storage on their Google Drive, which is where they’d rather have us keep student-related files since they maintain security on it. My various IOS apps use DropBox for syncing and back up.

I’ve moved all of my notes over to nvAlt, which I sync across all my devices (it keeps files as plain text which allows me to write using various MultiMarkdown editors).

I just finished doing complete installations of all my files on a new Retina iMac, an older iMac that got a new 3TB drive as a result of the Apple drive recall, and a new MacBook Air. The installation from SuperDuper! went a lot faster than the one from TimeMachine.

26

Barry 07.24.15 at 6:19 pm

James, when Skynet fires off the nukes, we’ll know whose data is safe :)

Can I ask if you learned this redundancy the hard way, or just from watching others?

27

Barry 07.24.15 at 7:35 pm

Also, James, John Q once had an article on integrating statistical software into writing; this should be paired with that.

Perhaps a CT symposium on software which *actually* makes one’s life more organized *and* easier?

28

Jon 07.25.15 at 2:15 am

Just another vote for Zotero – which is free, has a great and supportive community and is run and written by academics, for academics. I’ve been using it since it’s very earliest versions, so over ten years now. It’ll happily handle tens of thousands of pdf files – and will in most cases find the appropriate bibliographic metadata. You don’t have to use their storage for backup. They offer free metadata syncing but you do have to pay for the pdf storage. But it’s easy enough to use a free online storage solution that’ll run to 10Gb or so – (normally) plenty for pdfs.
It does mean that your pdfs wont’ be stored in a human readable structure but you can (one click) use Zotero to rename the pdfs using the bibliographic metadata that it fetches for you.
It then integrates with Word & LibreOffice to make referencing painless – it’s a complete replacement for Endnote.

29

Zora 07.25.15 at 3:22 am

I organize my PDFs by subject, in labeled folders (I think there might be twenty or so), inside a folder labeled research. I also change the file names to topic – author. That’s enough to find something.

I also periodically prune my files. All my files. If you do it every few years, it isn’t so bad.

30

novakant 07.25.15 at 8:18 am

Another vote for SuperDuper, I use it to back up my system drive because it’s bootable.

That said, ask yourself: how many times has a drive failed you and what was on it? Keeping a core number of really important files safe in various places is much more important than backing up everything all the time.

31

Metatone 07.25.15 at 9:14 am

I (still) use Endnote for academic stuff, largely because my main library source for journal articles etc. (still) works better with it than with Papers or Zotero.

Which ever you choose, they make a big difference.
Auto-generation of cite/bibliography is a good thing too… although it helps if your collaborators use the same system…

32

John Quiggin 07.25.15 at 11:40 am

@27 While I might have lost the file in my memory, it’s more like that you are thinking of one of Kieran’s post.

33

engels 07.25.15 at 12:06 pm

‘Is that dumb for any reason?’

Yes

34

Barry 07.25.15 at 1:32 pm

John Quiggin 07.25.15 at 11:40 am

” @27 While I might have lost the file in my memory, it’s more like that you are thinking of one of Kieran’s post.”

Sorry. But it’s still a great idea.

35

James Schmidt 07.25.15 at 2:22 pm

@26 I suspect my redundant backup strategies have a lot to do with the fact that (1) it doesn’t take that much effort, (2) I seem to have bought a lot of thumb drives and portable hard drives, and (3) it gives the illusion of having accomplished something.

36

Tiny Hermaphrodite, Esq. 07.25.15 at 10:11 pm

OT: Franz Ferdinand Sparks is great. Belle is her wrongest since the Iraq Invasion.
Greatly Recommended.

37

Lisa Schweitzer 07.26.15 at 2:39 am

Another vote here for Sente for pdf management…

38

John Holbo 07.26.15 at 9:26 am

I figured out the secret. All you have to do is spend three straight days doing nothing but sorting through everything, until you go mad and blind. I should know!

39

Barry 07.26.15 at 7:30 pm

John, when your alternative is to actually work on your dissertation, or do your taxes…………:)

40

Doug 07.27.15 at 6:40 pm

Whoops, meant Qrecall instead of Chronosync:

http://www.qrecall.com/features.jsp

41

Sumana Harihareswara 07.27.15 at 7:30 pm

The Antenna blog has been hosting some essays on how researchers deal with stuff like this: http://blog.commarts.wisc.edu/category/columns/digital-tools/

42

Adam Hammond 07.28.15 at 1:28 am

I use a version of the one mentioned by dax @8. I don’t think it counts as an organization system in any defensible way. However, I actually do it, unlike all of the real organization schemes that I have tried.

I seem to actively work on something between 3 and 8 projects at the same time, although it feels like more. Several times a year, all of the various places that I stash files get too complicated for me to visually scan for what I need, so I create a dated archive folder and throw everything that is not a current project into that. I then open the metadata on the folder and jot down some key words that have to do with some of the projects that I noticed in transit.

Mistakes are made, so I keep the archive handy for a while. Badly named pdfs are given better names if and when I have to pull them out of an archive folder.

When I return to a long quiescent project I have to do some searches to gather up the bits from several old folders. This retrieval process is the only time I create organized folders, which means that only my long-term, low priority projects have organized resources. That seems exactly right.

The download folder has to be a special case. I leave pdfs in it if I read them the one time. If I decide to annotate a paper, then it gets named and moved to … somewhere I currently like. Every two months or so, the contents of the downloads folder is purged: dmg and pdf files deleted, everything left over ~20MB gets individual attention, everything else deposited in an ever growing heap of compost that doesn’t really take up that much room.

I have carried this evolving system over my last 6 computers, including all of the carefully formatted folders and proprietary data files that are the detritus of the failed organization schemes. I could pull together a book of hopeful journal entries, each written as I launched one of those getting-my-life-in-order projects. I now aspire to serving as a cautionary tale.

Comments on this entry are closed.