Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > [QUIZ] Quoted Printable (#23)

Reply
Thread Tools

[QUIZ] Quoted Printable (#23)

 
 
Ruby Quiz
Guest
Posts: n/a
 
      03-11-2005
The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The quoted printable encoding is used in primarily in email, thought it has
recently seen some use in XML areas as well. The encoding is simple to
translate to and from.

This week's quiz is to build a filter that handles quoted printable translation.

Your script should be a standard Unix filter, reading from files listed on the
command-line or STDIN and writing to STDOUT. In normal operation, the script
should encode all text read in the quoted printable format. However, your
script should also support a -d command-line option and when present, text
should be decoded from quoted printable instead. Finally, your script should
understand a -x command-line option and when given, it should encode <, > and &
for use with XML.

Here are the rules we will use, from the quoted printable format:

1. Bytes with ASCII values from 33 (exclamation point) through 60 (less
than) and values from 62 (greater than) through 126 (tilde) should be
passed through the encoding process unchanged. Note that the -x switch
modifies this rule slightly, as stated above.

2. Other bytes are to be encoded as an equals sign (=) followed by two
hexadecimal digits. For example, when -x is active less than (<) will
become =3C. Use only capital letters for hex digits.

3. The exceptions are spaces and tabs. They should remain unencoded as
long as any non-whitespace character follows them on the line. Spaces
and tabs at the end of a line, must be encoded per rule 2 above.

4. Native line endings should be translated to carriage return-line feed
pairs.

5. Quoted printable lines are limited to 76 characters of length (not
counting the line ending pair). Longer lines must be divided up. Any
line endings added by the encoding process should be proceeded by an
equals sign, so the unecoder will know to remove them. The equals sign
must be the last character on the line, followed immediately by the line
end pair. Such an equals sign does count as a non-whitespace character
for rule 3, allowing preceding spaces and tabs to remain unencoded.
The equals sign must fit inside the 76 character limit.

To unecode, just reverse the process.


 
Reply With Quote
 
 
 
 
Glenn Parker
Guest
Posts: n/a
 
      03-13-2005
Note: I assumed it would be cheating to use the builtin quoted printable
facilities.

I found it somewhat frustrating that String#each_byte does not return
any useful value (see encode_str).

I found it a bit more frustrating that String#chomp! is a greedier than
you might expect, discarding all sorts of potential line endings,
instead of limiting itself to $/.

I would also suggest that adding support for GetoptLong#[] to query
options directly, instead of requiring a full iteration.



#!/usr/bin/env ruby -w

require 'getoptlong'

MaxLength = 76

def main
opts = GetoptLong.new(
[ "-d", GetoptLong::NO_ARGUMENT ],
[ "-x", GetoptLong::NO_ARGUMENT ]
)
$opt_decode = false
$opt_xml = false
opts.each do |opt, arg|
case opt
when "-d": $opt_decode = true
when "-x": $opt_xml = true
end
end

if $opt_decode
decode_input
else
encode_input
end
end

def encode_input
STDOUT.binmode # We need to control the line-endings.
while (line = gets) do
# Note: String#chomp! swallows more than just $/.
line.sub!(/#{$/}$/o, "")
# Encode the entire line.
line.gsub!(/[^\t -<>-~]+/) { |str| encode_str(str) }
line.gsub!(/[&<>]+/) { |str| encode_str(str) } if $opt_xml
line.sub!(/\s*$/) { |str| encode_str(str) }
# Split the line up as needed.
while line.length > MaxLength
split = line.index("=", MaxLength - 4) - 1
split = (MaxLength - 2) if split.nil? or (split > MaxLength - 2)
print line[0..split], "=\r\n"
line = line[(split + 1)..-1]
end
print line, "\r\n"
end
end

def encode_str(str)
encoded = ""
str.each_byte { |c| encoded << "=%02X" % c }
encoded
end

def decode_input
while (line = gets) do
line.chomp!
line.gsub!(/=([\dA-F]{2})/) { $1.hex.chr }
if line[-1] == ?=
print line[0..-2]
else
print line, $/
end
end
end

main


--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>



 
Reply With Quote
 
 
 
 
James Edward Gray II
Guest
Posts: n/a
 
      03-13-2005
On Mar 13, 2005, at 12:57 PM, Glenn Parker wrote:

> Note: I assumed it would be cheating to use the builtin quoted
> printable facilities.


I must sheepishly admit that I was unaware of of Ruby's converter when
I made the quiz. It was pointed out the me in a private email after I
posted it. The converter isn't a complete solution to the quiz, but it
gets you very close.

Is it cheating to use Ruby features? Never. Feel free, then poke a
little fun at the quiz editor because you're smarter than he is. All
part of the fun.

Sorry for the oversight.

James Edward Gray II



 
Reply With Quote
 
Dave Burt
Guest
Posts: n/a
 
      03-14-2005
Hi,

Testing. I found building a test suite before doing the code really helpful on
this one, to get my head around the intricacies of the encoding. Actually
thinking through the edge cases and working out expected results was necessary
for me to develop this solution.

Now, of course, this would have been a lot easier if I'd just been able to find
the "builtin quoted printable facilities." What builtin quoted printable
facilities?

Anyway, here is my result:
http://www.dave.burt.id.au/ruby/quoted-printable.rb

And the tester:
http://www.dave.burt.id.au/ruby/test...d-printable.rb

The testing program generates test methods and test data dynamically.

The public interface to my solution looks like this:

module QuotedPrintable

WHITESPACE = [?\t, ?\ ]
WHITESPACE_REGEXP = /[\t ]/
WHITESPACE_ESCAPED_REGEXP = /=09|=20/

# bytes that do not need to be escaped
PRINTABLES = ((?!..?~).to_a + WHITESPACE) - [?=]

MAX_LINE_WIDTH = 76

NEWLINE = "\r\n"

# additional bytes to escape for safety in an EBCDIC document
EBCDIC_EXCEPTIONS = %w' ! " # $ @ [ \ ] ^ ` { | } ~ '
EBCDIC_PRINTABLES = PRINTABLES - EBCDIC_EXCEPTIONS
# additional bytes to escape for safety in an XML document
XML_EXCEPTIONS = %w' < > & '
XML_PRINTABLES = PRINTABLES - XML_EXCEPTIONS

# Encode self to the quoted-printable transfer encoding
def to_quoted_printable(printables = QuotedPrintable:RINTABLES)

# Decode self from the quoted-printable transfer encoding
def from_quoted_printable


# Functions that do quoted-printable encoding and decoding
class << self

# Return the quoted-printable escaped representation of the given byte
# (byte must be a Fixnum between 0 and 255)
def encode_byte(byte)

# Return the byte corresponding to the given quoted-printable escape
# sequence as a String. If it's not valid, return nil.
def decode_sequence(escape_sequence)

# Return the given string encoded as quoted-printable, including the
# canonical \r\n line terminators.
def encode_string(string, printables = PRINTABLES)

# Consider the given string quoted-printable encoded, and decode it,
# including translating line terminators to the native default.
def decode_string(string)

# Add quoted-printable conversions to String
class String
include QuotedPrintable # to_quoted_printable, from_quoted_printable
end

Cheers,
Dave



 
Reply With Quote
 
James Edward Gray II
Guest
Posts: n/a
 
      03-14-2005
On Mar 14, 2005, at 9:41 AM, Dave Burt wrote:

> Now, of course, this would have been a lot easier if I'd just been
> able to find the "builtin quoted printable facilities." What builtin
> quoted printable facilities?


Look up the "M" format for Array.pack.

James Edward Gray II



 
Reply With Quote
 
Dave Burt
Guest
Posts: n/a
 
      03-15-2005
>> What builtin quoted printable facilities?
>
> Look up the "M" format for Array.pack.


So here's the cheat solution:

class String
def to_quoted_printable(*args)
[self].pack("M").gsub(/\n/, "\r\n")
end
def from_quoted_printable
self.gsub(/\r\n/, "\n").unpack("M").first
end
end

(Just add my original if __FILE__ block to make it almost quiz-compatible)

And here's how it fares against my test suite:

Loaded suite TC_QuotedPrintable
Started
.............FF.FFFFFFF..
Finished in 0.39 seconds.

So it's 10 times the speed of my original one (against random binary data), but
chops lines too early, ends up with 73- instead of 76-character lines. Of
course, this one won't do XML.

Interestingly, if I use a gsub! instead of a loop with sub!s in my soft_break!
method, I get a 5x speedup... and fail the same tests.

Cheers,
Dave



 
Reply With Quote
 
James Edward Gray II
Guest
Posts: n/a
 
      03-15-2005
(from Dave's solution)

if __FILE__ == $0
require 'optparse'

# Look, James, I'm opt-parsing!
...

I'm so proud!

James Edward Gray II



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
TMail quoted-printable encoding question Chris Roos Ruby 1 10-06-2005 01:16 PM
Question about string.printable and non-printable characters Daniel Alexandre Python 2 03-21-2005 12:34 PM
[SUMMARY] Quoted Printable (#23) Ruby Quiz Ruby 0 03-17-2005 02:03 PM
[SOLUTION] Quoted Printable (#23) Patrick Hurley Ruby 8 03-17-2005 12:14 AM
[QUIZ] Quoted Printable (#23) SOLUTION Matthew Moss Ruby 0 03-14-2005 05:18 AM



Advertisments