Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Ruby method to strip out XML codes?

Reply
Thread Tools

Ruby method to strip out XML codes?

 
 
Michael W. Ryder
Guest
Posts: n/a
 
      12-06-2007
I am trying to process an XML file that includes various codes. The
problem I am running into is that some of these codes are inserted into
the middle of an encrypted string. If I display the file using a
browser these codes do not show up and copying and pasting the string
work fine. The problem occurs when I try to strip out the string in a
program and these "extraneous" XML codes are included. This of course
makes the decryption routine crash.
What I am looking for is a simple way to read through the file and
remove all the XML codes leaving just plain text. I could probably
write a series of regular expressions to remove each code that I can
find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.
 
Reply With Quote
 
 
 
 
Phrogz
Guest
Posts: n/a
 
      12-06-2007
On Dec 5, 6:13 pm, "Michael W. Ryder" <(E-Mail Removed)>
wrote:
> I am trying to process an XML file that includes various codes. The
> problem I am running into is that some of these codes are inserted into
> the middle of an encrypted string. If I display the file using a
> browser these codes do not show up and copying and pasting the string
> work fine. The problem occurs when I try to strip out the string in a
> program and these "extraneous" XML codes are included. This of course
> makes the decryption routine crash.
> What I am looking for is a simple way to read through the file and
> remove all the XML codes leaving just plain text. I could probably
> write a series of regular expressions to remove each code that I can
> find in my text but am afraid I might miss some and it will come back to
> haunt me at a later time.


str.gsub /</?[^>]+>/, ''

This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not &lt, like:

for ( var i=0, len=a.length; i<len; ++i )

In that case you likely want a proper XML parser (like REXML) and to
use it.

Do you really want to remove the XML, or would it suffice to just:

str.gsub! '&', '&amp;'
str.gsub! '<', '&lt;'
str.gsub! '>', '&gt;'
(and maybe even)
str.gsub! '"', '&quot;'
str.gsub! "'", '&apos;'

to make your string valid and escaped for use in an HTML context?
 
Reply With Quote
 
 
 
 
Michael W. Ryder
Guest
Posts: n/a
 
      12-06-2007
Phrogz wrote:
> On Dec 5, 6:13 pm, "Michael W. Ryder" <(E-Mail Removed)>
> wrote:
>> I am trying to process an XML file that includes various codes. The
>> problem I am running into is that some of these codes are inserted into
>> the middle of an encrypted string. If I display the file using a
>> browser these codes do not show up and copying and pasting the string
>> work fine. The problem occurs when I try to strip out the string in a
>> program and these "extraneous" XML codes are included. This of course
>> makes the decryption routine crash.
>> What I am looking for is a simple way to read through the file and
>> remove all the XML codes leaving just plain text. I could probably
>> write a series of regular expressions to remove each code that I can
>> find in my text but am afraid I might miss some and it will come back to
>> haunt me at a later time.

>
> str.gsub /</?[^>]+>/, ''
>
> This will only be a problem if your XML file is legal and has a CDATA
> section which has a literal < character (not &lt, like:
>
> for ( var i=0, len=a.length; i<len; ++i )
>
> In that case you likely want a proper XML parser (like REXML) and to
> use it.
>
> Do you really want to remove the XML, or would it suffice to just:
>
> str.gsub! '&', '&amp;'
> str.gsub! '<', '&lt;'
> str.gsub! '>', '&gt;'
> (and maybe even)
> str.gsub! '"', '&quot;'
> str.gsub! "'", '&apos;'
>
> to make your string valid and escaped for use in an HTML context?


My problem is that the XML file includes &#xD;&#xA; in the middle of a
couple of fields, especially in the encrypted fields. If I just strip
out the encrypted field and try to decrypt it the program crashes as the
key is invalid. I have to remove the "bad" character strings before
sending it to my decryption program. I would prefer to do this removal
before sending the file to my programs so that I don't have to deal with
these codes.
I assume that the string I am seeing is XML's way of saying CR/LF as DA
in hex is CR/LF and the output in a browser shows the field being broken
at that point. The problem is that is only the ones that I have noticed
and there may be others hiding in the data. The XML file is being
parsed for conversion to our accounts.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
strip all but second second line from bottom and then strip that!!!! yelipolok Perl Misc 4 01-27-2010 08:14 AM
How to strip ruby comments in a ruby line of code? Alexandre Mutel Ruby 16 11-19-2009 04:55 PM
Problem with the strip string method Colin J. Williams Python 6 03-03-2008 03:04 PM
Right tool and method to strip off html files (python, sed, awk?) sebzzz@gmail.com Python 5 07-16-2007 03:01 AM
strip and its evil brother strip! Aquila Ruby 35 03-31-2005 04:10 AM



Advertisments