I just upgraded to MS-Office 2008 for mac. God help me. I did it without reading the reviews. Now I discover custom macros are history. I don’t really care so much, except I worry that EndNote won’t play nice now. (Won’t that be lovely?) And I used to have a custom macro for converting ascii/plaintext – i.e. stripping out all the hard returns. So I could cut&paste email or a Gutenberg book, select one menu item, and get the lines to wrap instead of being all sullen and jagged out there on the right. It’s such a common problem. Now how am I going to solve it?
What’s a simple fix for converting ascii/plaintext to MS-Word?
UPDATE: OK, on reflection it’s pretty clear how to make a 4-step fix using find&replace. Having bothered to figure this out, I’ll just put the simple solution under the fold.
OK, you’ve cut and pasted into Word some nasty thing full of jagged hard returns. Thousands of words out of Project Gutenberg, for example. Select all.
Find: ^p [That’s the code for paragraph break. Now double it: karat-p-karat-p. It doesn’t show up right in the post if I type it double like that; but that’s what you want. You are replacing all occurrence of two breaks together.]
Replace: HARDPARAGRAPH
Why? You are going to be stripping hard returns from line endings but you want to know where the actual paragraph breaks are. So you temporarily replace all occurrences of double breaks with some dummy string of characters you know isn’t going to occur naturally in the text. Like: hardparagraph.
Find: ^p
Replace: [hit spacebar once to create one empty space]
This is the step that actually gets rid of all those ugly breaks at the line endings.
Find: HARDPARAGRAPH
Replace: ^p [if your paragraphs are going to be unindented blocks, you might want to double it to get the empty line between paragraphs back.]
That seems to do it for me. Was that clear? (Sorry for the difficulties rendering it in HTML.)
But I’m still pretty pissed off about MS-Word 2008.
{ 21 comments }
brennen 04.10.08 at 7:55 am
Access to Perl?
#!/usr/bin/perl
$text = <<TXT;
Here is a jagged
paragraph with
a bunch
of
line breaks in it.
Here is another.
Paragraph,
that is.
TXT
$text =~ s/(?<!\n)\n(?=[^\n])/ /sg;
print $text;
Gives:
Here is a jagged paragraph with a bunch of line breaks in it.
Here is another. Paragraph, that is.
I rather imagine that A) there’s a much better solution, and B) someone less tired and more sober than I will probably provide it shortly.
brennen 04.10.08 at 7:57 am
(Also, the backslashes before those three
n
s just got eaten.)Steven Poole 04.10.08 at 9:54 am
You want the free WordService from Devon Technologies.
islamoyankee 04.10.08 at 10:20 am
You can also try the free programs TextWrangler and Smultron, both of which offer an “unwrap” feature that I live on when copying out passages from JSTOR downloads.
curious citizen 04.10.08 at 10:49 am
The annoying thing about all the freeware or free onlne linebreak removers that I’e found is that they also strip other formatting (bold, italics, etc.).
John’s solution, at least, leaves the rest well alone.
In any case, I haven’t found a compelling reason to upgrade.
Timon 04.10.08 at 11:26 am
There are a lot of tricks for Gutenberg type texts, an appalling python one-liner along the lines of brennen is
for line in [item.replace(‘\r\n’, ‘ ‘) for item in open(‘jaggedgutenberg.txt’).read().split(‘\r\n\r\n’)]: f.write(line+’\n’)
That actually works, where ‘jagged…’ is the Gutenberg text and f is the new file. (Tested on Sun Tzu ;)
Timon 04.10.08 at 11:31 am
for \r\n read \\r\\n, they were interpreted by the comment parser as escapes
John Rynne 04.10.08 at 11:46 am
Yes, MS-Office 2008 for mac is a piece of crap. Fortunately I kept the previous version on hand. Ms Word I have renamed “Kenny”, because it dies in every episode (cf. South Park).
Zippy the Comment Frog 04.10.08 at 2:14 pm
I normally search and replace for a space followed by a paragraph break, which avoids removing the second one in a double line break. That does not necessarily save a step, however, because then I run a search and replace for two spaces in a row — sometimes there are none, but often there are at least a few.
Kieran Healy 04.10.08 at 2:14 pm
As this very blog wisely counseled, don’t upgrade.
SamChevre 04.10.08 at 2:15 pm
For the fairly simple fix I use, copy into TextPad (trial version is free), and save–then open with Word. TextPad lets you choose encoding type.
Dan 04.10.08 at 2:27 pm
Still using Endnote? Zotero, dammit, Zotero! (Though the Word plugin probably won’t work for you now…)
http://www.zotero.org/
andrew 04.10.08 at 4:18 pm
The internet archive’s solution to the line break problem is pretty much the same as yours.
terence 04.10.08 at 7:59 pm
The one good thing about Word 2008 for a PC is that, under referencing, it has its own equivalent of Endnote. It’s not quite as good but seems, from first inspection, to work okish.
Everything else is evil though.
ben wolfson 04.11.08 at 5:40 am
Still using Endnote? Zotero, dammit, Zotero!
ebib?
John Holbo 04.11.08 at 1:52 pm
Actually, what IS the best bibliography tool. I feel fairly locked into EndNote since I’ve been creating a library over a period of years. But maybe there is something better out there. I have started using Zotero to take notes while surfing. But I haven’t tried to generate MS-Word exportable stuff by means of it. (Is that actually possible?)
Clark 04.11.08 at 4:55 pm
EndNote will do worse than not play well. It won’t play at all. The head guy from MBU was at Ars answering questions and whoa there was a lot of hate.
On the other hand you can do macros in Python easily now which is arguably much better than VBA anyway. But if you’re trying to maintain macros it’s a pain.
I keep hoping the new version of iWork comes out soon. Numbers has a lot of promise but isn’t there yet. (It doesn’t even support scripting yet) Pages 2.0 is arguably better than Word for most things now though. I rarely open up Word although I upgraded to 2008 just to use Excel and Applescript. Although inexplicably Applescript in Excel 2004 allowed either Unix or old Sys9 styled path separators whereas Excel 2008 won’t accept Unix paths at all. You have to use colons (:) for all path separators!!!!
John Quiggin 04.12.08 at 1:02 am
I like Bookends as a biblio package. But as a general matter I tend to like obscure alternatives to market-dominant programs. For WP, I like NisusWriter.
Michael H Schneider 04.13.08 at 12:40 am
Shouldn’t that be caret, rather than karat?
Dr Paisley 04.13.08 at 1:20 am
I just upgraded to MS-Office 2008 for mac. God help me. It’s such a common problem. Now how am I going to solve it?
Fixed.
tmlbk 04.13.08 at 11:34 pm
Let me put in a good word for a LaTex solution consisting of MikTex, Lyx and BibTex.
Comments on this entry are closed.