Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Ruby to convert US to UK punctuation/spelling?

Reply
Thread Tools

Ruby to convert US to UK punctuation/spelling?

 
 
Michael Lommel
Guest
Posts: n/a
 
      06-16-2008
I have about a thousand multipage documents which I need to convert from
US English and punctuation to UK English and punctuation. Before I start
on a ruby script (I'm just learning ruby) wanted to see if anyone knows
of existing tools to do this? I've also looked into a perl US->UK
conversion tool but doesn't seem to exist.

I'm starting with utf8 rtf documents which have printer's quotes (i.e.,
distinct left and right curly quotes) which were retained from an
original conversion from MS Word docs. For my documents, converting from
US to UK punctuation means double quotes become single quotes and some
single quotes become double (apostrophes are retained and single quotes
not inside double quotes would need to be retained); but in the
conversion I would like to retain distinct left and right quotation
marks.

I'm thinking that the end documents should have all print typography
(em-dashes, en-dashes, printer quotes) should be converted to character
entities.

If there is no existing script to do this (seems like a problem others
must have faced before) any thoughts on the right approach/tools/code
snippets?

Many thanks Rubyist...
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Axel Etzold
Guest
Posts: n/a
 
      06-16-2008

-------- Original-Nachricht --------
> Datum: Mon, 16 Jun 2008 09:45:45 +0900
> Von: Michael Lommel <>
> An: ruby-
> Betreff: Ruby to convert US to UK punctuation/spelling?


Dear Michael,

> I have about a thousand multipage documents which I need to convert from
> US English and punctuation to UK English and punctuation. Before I start
> on a ruby script (I'm just learning ruby) wanted to see if anyone knows
> of existing tools to do this? I've also looked into a perl US->UK
> conversion tool but doesn't seem to exist.


for general spell-checking, there is aspell, which you can use with different language
options, and there are Ruby bindings for it:

http://blog.evanweaver.com/files/doc...es/README.html

So you might use the language option Aspell.new("en_GB") rather than Aspell.new("en_US") for the spell checking of misspelled (in the British English sense) American English text.
If you have so much text, it will find some other errors, that both language forms consider erroneous, too.

>I've also looked into a perl US->UK conversion tool but doesn't seem to exist.


There certainly are Perl bindings to aspell- I'd bet a hundred quid/two hundred bucks


> For my documents, converting from
> US to UK punctuation means double quotes become single quotes and some
> single quotes become double (apostrophes are retained and single quotes
> not inside double quotes would need to be retained); but in the
> conversion I would like to retain distinct left and right quotation
> marks.


That suggests some combination of String#scan, String#gsub and Regular expressions ...
Since apostrophes and quotation marks are the same sign, I'd suggest making a
list of words with apostrophes, write them to a file, where you can correct them manually,
and String#gsub - replace first the apostrophes by something like <apostrophe>
and then the quotes by <lquote> and <rquote> or the other way round.

There's a Regular expressions tutorial here:

http://www.regular-expressions.info/tutorial.html


> I'm thinking that the end documents should have all print typography
> (em-dashes, en-dashes, printer quotes) should be converted to character
> entities.
>


You can do that with String.gsub("--",'<em-dash>'), after having copied the em-dash
into the double quotes... etc..


Best regards,

Axel
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert a ruby string into WChar string in ruby? Wu Nan Ruby 3 12-26-2007 02:36 PM
To convert to J2SE 6 or not to convert, that is the question... Jaap Java 4 07-10-2006 09:03 AM
#!/usr/bin/ruby , #!/usr/bin/ruby -w , #!/usr/bin/ruby -T?, #!/usr/bin/ruby -T1... anne001 Ruby 1 04-23-2006 03:02 PM
convert list of strings to set of regexes; convert list of strings to trie Klaus Neuner Python 7 07-26-2004 07:25 AM
Do I need to Convert with Convert.ToInt32(session("myNumber")) ? Andreas Klemt ASP .Net 1 07-23-2003 02:59 PM



Advertisments