Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > UTF-8 encoding problem

Reply
Thread Tools

UTF-8 encoding problem

 
 
shreshth.luthra@gmail.com
Guest
Posts: n/a
 
      10-18-2006
Hi All,

I am having a GUI which accepts a Unicode string and searches a given
set of xml files for that string.

Now, i have 2 XML files both of them saved in UTF-8 format, having
characters of different language.

Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.

Please help.

Regards,
Shreshth

 
Reply With Quote
 
 
 
 
Ron Natalie
Guest
Posts: n/a
 
      10-18-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

>
> Initilally i thought that the problem is mainly because of UTF-8 being
> supporting both MultiBye and Unicode, but could not find much on it.
>
>

What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
 
Reply With Quote
 
 
 
 
shreshth.luthra@gmail.com
Guest
Posts: n/a
 
      10-18-2006
I know this has nothing to do with C++ in particular but where better
to ask such a question.

Anyways,
>your problem is your document isn't conforming with the document
> rules that the search program is using.


I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)

But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.

Thanks for your reply.

Shreshth


Ron Natalie wrote:
> (E-Mail Removed) wrote:
>
> >
> > Initilally i thought that the problem is mainly because of UTF-8 being
> > supporting both MultiBye and Unicode, but could not find much on it.
> >
> >

> What does this have to do with C++ at all?
> UTF-8 is a multibyte encoding of the Unicode (which effectively
> is a 32 bit character space) but I doubt that's your problem.
> Your problem is your document isn't conforming with the document
> rules that the search program is using.


 
Reply With Quote
 
loufoque
Guest
Posts: n/a
 
      10-18-2006
(E-Mail Removed) wrote:

> Although both of them are having UTF-8 as BoM, but only first file is
> having UTF-8 defined in XML declration at the top of the XML file as
> well.


BOMs are quite useless for UTF-8. They're nothing but facultative.
And according to the XML spec (AFAIK), the default encoding when no
encoding is declared is UTF-8.


> Now, when i search for some different langauge character in that
> directory using a third party GUI for desktop search, it shows that the
> charcter exist in the first file (in which XML declation was also
> there), but not in the second file (having only BoM)


OK, so you have a problem with your broken third party application.
How is that related with C++?


> Initilally i thought that the problem is mainly because of UTF-8 being
> supporting both MultiBye and Unicode, but could not find much on it.


Like most of your message, what you say just doesn't make much sense.


> Please help.


Getting a basic understanding of what Unicode and its encoding formats
are would surely help.

 
Reply With Quote
 
loufoque
Guest
Posts: n/a
 
      10-18-2006
Ron Natalie wrote:

> the Unicode (which effectively
> is a 32 bit character space)


Unicode only reserves 2^20 + 2^16 mappings.
21 bits is more than enough to store that.
 
Reply With Quote
 
Peter Jansson
Guest
Posts: n/a
 
      10-18-2006
(E-Mail Removed) wrote:
> I know this has nothing to do with C++ in particular but where better
> to ask such a question.


The statement above is the best I have seen in a long time here.

If you know your question have "nothing to do with C++ in particular"
then why do you ask in a newsgroup dedicated to the C++ language? That
is like asking for help with you car in a bicycle shop.

You will probably get much better response if you ask in a forum
dedicated to your problem.

Sincerely,

Peter Jansson
http://www.p-jansson.com/
http://www.jansson.net/
 
Reply With Quote
 
Bhushan
Guest
Posts: n/a
 
      10-19-2006


Check your 3rd party search tool documentation about how it searches
XML files.


(E-Mail Removed) wrote:
> I know this has nothing to do with C++ in particular but where better
> to ask such a question.
>
> Anyways,
> >your problem is your document isn't conforming with the document
> > rules that the search program is using.

>
> I am not able to understand what you are trying to say by this.
> Ofcourse i cannot do anything about the Search Program (Which is for
> sure using Unicode)
>
> But the question is that if both the file are in UTF-8 format why is it
> (search program) working only for the one having UTF-8 in its XML
> declaration as well.
> Does it really make any difference in this regard.
>
> Thanks for your reply.
>
> Shreshth
>
>
> Ron Natalie wrote:
> > (E-Mail Removed) wrote:
> >
> > >
> > > Initilally i thought that the problem is mainly because of UTF-8 being
> > > supporting both MultiBye and Unicode, but could not find much on it.
> > >
> > >

> > What does this have to do with C++ at all?
> > UTF-8 is a multibyte encoding of the Unicode (which effectively
> > is a 32 bit character space) but I doubt that's your problem.
> > Your problem is your document isn't conforming with the document
> > rules that the search program is using.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
encoding problem with BeautifulSoup - problem when writing parsedtext to file Greg Python 9 10-08-2011 03:30 PM
Reading Text File Encoding and converting to Perls internal UTF-8 encoding sln@netherlands.com Perl Misc 2 04-17-2009 11:22 PM
changing JVM encoding; setting -Dfile.encoding doesn't work pasmol@plusnet.pl Java 1 10-08-2004 09:50 PM
Encoding.Default and Encoding.UTF8 Hardy Wang ASP .Net 5 06-09-2004 04:04 PM
Problem encoding/decoding image Slade ASP .Net 1 06-25-2003 09:28 AM



Advertisments