Uncategorized

How to split a csv file

If you’re on a PC, the instructions we posted here: http://www.themacroscope.org/?page_id=418 work. It’s a macro, in visual basic, for excel. But after a long back and forth yesterday with @thomasgpadilla we worked out that it would break in various Mac versions of Excel. Why? I do not know. But there’s a pretty simple command line sequence on a Mac that’ll do the trick. We split it into big chunks, then split the big chunks into smaller chunks. So, grab a moderately large CSV (like John Adams’ diary, 1025 entries, 1 per row) and fire up your terminal thusly:

$ split -l 200 johnadams-diary.csv 200out
$ split -l 1 200outaa aa
$ split -l 1 200outab ab
$ split -l 1 200outac ac
$ split -l 1 200outad ad
$ split -l 1 200outae ae
$ split -l 1 200outaf af

and so we get a series of files,

aaaa

aaab

aaac….

aeaa

aeab

aeac…

until all 1025 rows are their own file. But they don’t have extensions. So:

find . -type f -exec mv '{}' '{}’.csv \;

will recursively go through the current folder, finding files, and appending .csv to them.

Ta da!

[edit] sometimes you should read the manual. The error message I was getting when I split the original file one line at a time was ‘file too big’. But of course, that’s because of the default names a-z by a-z, so only 676 combinations, which means that yes, 1025 lines *is* too big… but, if you tell split to use a three-letter prefix, you can split up to 17576 lines. The -a flag lets you make this change, like so:

split -a 3 -l 1 johnadams-diary.csv

which is what I should’ve done in the first place. D’oh! Ah well: you don’t have to know everything in digital history. Work it out in public, and somebody is sure to let you know how you could’ve done it better 😉 Happily, I caught this fairly quickly after I made my first post – but how much more elegant if I’d gone for the best solution right away? Well, sometimes, the best solution, is the one that works when you need it to.

Advertisements
Standard