Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > trying to validate my rss feed

Reply
Thread Tools

trying to validate my rss feed

 
 
lkrubner@geocities.com
Guest
Posts: n/a
 
      02-21-2005

I've a client who, I think, writes his essays in Microsoft Word on a
Macintosh, then copies and pastes it to a form to upload it to his
weblog. The weblog then creates an RSS feed. The weblog and RSS feed
are created using a simple PHP script I wrote.

His RSS feed is not validating, apparently because of the Word "smart
quotes". This is a guess on my part. I need to find out for sure what
character is causing the rss failure. How do I do that? Here is the
feed:

http://www.feedvalidator.org/check.c...ge2494.xml#l25


What I'd like to do is run a simple search-n-replace for that character
right before the RSS is created. But I need to find a way to get that
character. Hex value? Byte code? How do I find such a thing?

I could teach this client not to make this mistake, but I assume (or
rather, I dream) at some point thousands of people will be using this
PHP script, and I can't teach all of them.

 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      02-21-2005


http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> I've a client who, I think, writes his essays in Microsoft Word on a
> Macintosh, then copies and pastes it to a form to upload it to his
> weblog. The weblog then creates an RSS feed. The weblog and RSS feed
> are created using a simple PHP script I wrote.
>
> His RSS feed is not validating, apparently because of the Word "smart
> quotes". This is a guess on my part. I need to find out for sure what
> character is causing the rss failure. How do I do that? Here is the
> feed:
>
> http://www.feedvalidator.org/check.c...ge2494.xml#l25


How about correcting the issues that validator raises?
You shouldn't label something as UTF-8 encoded XML if it isn't so I
think you need to make sure your PHP script makes sure it creates UTF-8
encoded XML if you want that format and encoding.



--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
 
 
 
Andy Dingley
Guest
Posts: n/a
 
      02-21-2005
On 21 Feb 2005 10:35:06 -0800, (E-Mail Removed) wrote:

>His RSS feed is not validating, apparently because of the Word "smart
>quotes".


Look for numeric entities in the output of — , ’ and the
like. They are probably still in there as UTF-16 characters.

Your PHP is broken (which is common behaviour for PHP & XML).
Although these characters are well-formed in XML (not _everything_
that M$oft do is actually invalid), they need to be represented in the
appropriate way for your encoding. As a guess, you're including UTF-16
characters in a document that's then getting served as UTF-8.
 
Reply With Quote
 
lkrubner@geocities.com
Guest
Posts: n/a
 
      02-22-2005
Sorry. On most sites I put a .htaccess file that tells the browser that
the text the server is sending out is UTF-8. However, what is really
being sent out can easily become a crazy hodgepodge of character sets,
when users start copy text from Word, WordPerfect, PDF files, Macs,
etc, and then pasting it into the form and posting that as their weblog
entry.

I've had other conversations elsewhere on Usenet that suggested its
hopeless trying to catch every encoding that people might try to input.
For now, that's beyond my resources.

But I would like to capture and change the 3 most common errors that
come up, and those are the smart quotes, double and single, from Word.

 
Reply With Quote
 
lkrubner@geocities.com
Guest
Posts: n/a
 
      02-22-2005
I'm not sure how to fix the PHP. I can't serve the RSS as plain text,
all the RSS validators complain about that. So I have to give an
encoding. So I decided to give it a UTF-8 encoding. (I usually do this
with an .htaccess file). But if people write stuff in Word and then
copy and paste it to a form and hit enter and post that as their weblog
entry, then how can I purify their input to keep the characters really
UTF-8?

I've asked this before on other newsgroups and have yet to hear an
answer that was within my resources to tackle.

It would help, of course, if I had a better understanding of character
encodings. I've been trying to educate myself, but its slow because I
don't have much time. Are there any resources on the subject you might
point me to?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Two ways to generate RSS - rss/maker and rss/2.0 - which is better? Jonathan Groll Ruby 1 06-27-2009 03:53 AM
Annoying Problem Trying To Validate An XML Feed. Losing My Mind!Please Help! Greg C. XML 8 08-21-2008 03:41 PM
Post RSS feed w/o RSS-to-Javascript.com Scott Gordo HTML 5 08-29-2006 01:34 AM
RSS Feed - need an Idiot's Guide to RSS News on my website teach_me6@hotmail.com HTML 5 02-25-2005 11:01 AM
Searches in multiple RSS feeds -> new rss feed Motta XML 1 06-09-2004 10:55 PM



Advertisments