Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > ifstream >> string with UTF-8?

Reply
Thread Tools

ifstream >> string with UTF-8?

 
 
Wolfnoliir
Guest
Posts: n/a
 
      09-09-2009
Hi,
Here is an question that must come up all the time but I can't find a
solution.

I would like to get a word or a line from an utf-8 encoded file into a
string but I get '�'s ('?') instead.
The strange thing is, this works fine from standard input:
cin >> someString; //works fine
cout << someString;
but
someIfStream >> someString;
cout << someString;
prints out question marks instead of accentuated characters!
(I'm using Linux and g++ 4.3.3)

Does anyone have an idea why that is or a solution to the problem?
 
Reply With Quote
 
 
 
 
Victor Bazarov
Guest
Posts: n/a
 
      09-09-2009
Wolfnoliir wrote:
> I would like to get a word or a line from an utf-8 encoded file into a
> string but I get '�'s ('?') instead.
> The strange thing is, this works fine from standard input:
> cin >> someString; //works fine
> cout << someString;
> but
> someIfStream >> someString;
> cout << someString;
> prints out question marks instead of accentuated characters!
> (I'm using Linux and g++ 4.3.3)
>
> Does anyone have an idea why that is or a solution to the problem?


Use your "working" 'cin' solution, but redirect the input to be from
your file:

your_test_app < file_with_utf8

and see if there is any difference. As to the cause, my guess would be
that your file stream gets dissynchronised from the encoding POV.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
 
Reply With Quote
 
 
 
 
Wolfnoliir
Guest
Posts: n/a
 
      09-09-2009
Victor Bazarov wrote:
>
> Use your "working" 'cin' solution, but redirect the input to be from
> your file:
>
> your_test_app < file_with_utf8
>
> and see if there is any difference. As to the cause, my guess would be
> that your file stream gets dissynchronised from the encoding POV.
>
> V


Indeed I get the same result when I do:
your_test_app < file_with_utf8

I'm not actually sure my file is utf-8. It probably isn't considering
that when I do this:
echo éoi*uè > txt
your_test_app < txt
it prints out correctly.

But how can I know what encoding my file is in?
Once I know that I think I can just convert it with iconv.
 
Reply With Quote
 
Victor Bazarov
Guest
Posts: n/a
 
      09-09-2009
Wolfnoliir wrote:
> Victor Bazarov wrote:
>>
>> Use your "working" 'cin' solution, but redirect the input to be from
>> your file:
>>
>> your_test_app < file_with_utf8
>>
>> and see if there is any difference. As to the cause, my guess would
>> be that your file stream gets dissynchronised from the encoding POV.
>>
>> V

>
> Indeed I get the same result when I do:
> your_test_app < file_with_utf8
>
> I'm not actually sure my file is utf-8.


Uh... Then why are you trying to treat it as such?

> It probably isn't considering
> that when I do this:
> echo éoi*uè > txt
> your_test_app < txt
> it prints out correctly.


But you said that 'cin' worked OK, while your ifstream attempt didn't.
You need to find out what is different with your ifstream code compared
to the 'cin'.

>
> But how can I know what encoding my file is in?


Not sure it's a C++ question, to be honest. A file is a file, it
contains bytes. The encoding is something you think up, apply, and it's
not part of the file itself, AFAIUI. You get different results based on
different encodings you apply. The "correctness" of those results is
also in your head only.

> Once I know that I think I can just convert it with iconv.


What's 'iconv'?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
 
Reply With Quote
 
Wolfnoliir
Guest
Posts: n/a
 
      09-09-2009
Victor Bazarov wrote:
> Wolfnoliir wrote:
>> Victor Bazarov wrote:
>>>
>>> Use your "working" 'cin' solution, but redirect the input to be from
>>> your file:
>>>
>>> your_test_app < file_with_utf8
>>>
>>> and see if there is any difference. As to the cause, my guess would
>>> be that your file stream gets dissynchronised from the encoding POV.
>>>
>>> V

>>
>> Indeed I get the same result when I do:
>> your_test_app < file_with_utf8
>>
>> I'm not actually sure my file is utf-8.

>
> Uh... Then why are you trying to treat it as such?
>
> > It probably isn't considering
>> that when I do this:
>> echo éoi*uè > txt
>> your_test_app < txt
>> it prints out correctly.

>
> But you said that 'cin' worked OK, while your ifstream attempt didn't.
> You need to find out what is different with your ifstream code compared
> to the 'cin'.


There's nothing different. As I said in my last message, I was wrong.
It's just my that my file has a different encoding than the standard
input my terminal sends (probably utf-.

>
>>
>> But how can I know what encoding my file is in?

>
> Not sure it's a C++ question, to be honest. A file is a file, it
> contains bytes. The encoding is something you think up, apply, and it's
> not part of the file itself, AFAIUI. You get different results based on
> different encodings you apply. The "correctness" of those results is
> also in your head only.
>
>> Once I know that I think I can just convert it with iconv.

>
> What's 'iconv'?


Iconv is a Unix utility that converts a text file from one encoding
(e.g. utf-16) to another (e.g. utf-.

>
> V


If nobody knows of a utility to find out what encoding my file is
using, I'll just go and look somewhere else then.

Thanks for your interest in my problem.
 
Reply With Quote
 
Richard Herring
Guest
Posts: n/a
 
      09-09-2009
In message <4aa7c28a$0$3511$(E-Mail Removed)>, Wolfnoliir
<(E-Mail Removed)> writes
>Victor Bazarov wrote:
>> Wolfnoliir wrote:
>>> Victor Bazarov wrote:
>>>>
>>>> Use your "working" 'cin' solution, but redirect the input to be
>>>>from your file:
>>>>
>>>> your_test_app < file_with_utf8
>>>>
>>>> and see if there is any difference. As to the cause, my guess
>>>>would be that your file stream gets dissynchronised from the
>>>>encoding POV.
>>>>
>>>> V
>>>
>>> Indeed I get the same result when I do:
>>> your_test_app < file_with_utf8
>>>
>>> I'm not actually sure my file is utf-8.

>> Uh... Then why are you trying to treat it as such?
>>
>> > It probably isn't considering
>>> that when I do this:
>>> echo oiu > txt
>>> your_test_app < txt
>>> it prints out correctly.

>> But you said that 'cin' worked OK, while your ifstream attempt
>>didn't. You need to find out what is different with your ifstream code
>>compared to the 'cin'.

>
>There's nothing different. As I said in my last message, I was wrong.
>It's just my that my file has a different encoding than the standard
>input my terminal sends (probably utf-.
>
>>
>>>
>>> But how can I know what encoding my file is in?

>> Not sure it's a C++ question, to be honest. A file is a file, it
>>contains bytes. The encoding is something you think up, apply, and
>>it's not part of the file itself, AFAIUI. You get different results
>>based on different encodings you apply. The "correctness" of those
>>results is also in your head only.
>>
>>> Once I know that I think I can just convert it with iconv.

>> What's 'iconv'?

>
>Iconv is a Unix utility that converts a text file from one encoding
>(e.g. utf-16) to another (e.g. utf-.
>
>> V

>
>If nobody knows of a utility to find out what encoding my file is
>using, I'll just go and look somewhere else then.


Is the 'file' command any help?

>
>Thanks for your interest in my problem.


--
Richard Herring
 
Reply With Quote
 
Wolfnoliir
Guest
Posts: n/a
 
      09-09-2009
Richard Herring wrote:
> Is the 'file' command any help?


$file dic-fr.txt
dic-fr.txt: ISO-8859 text

Indeed it is. Thank you very much!
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ifstream/string ctor Chris Forone C++ 4 12-18-2007 09:07 AM
I want to create a std::ifstream using std::string Assertor C++ 5 04-28-2007 01:17 AM
ifstream & string Leo C++ 3 03-10-2007 01:30 AM
error when initialize ifstream with string Lingyun Yang C++ 2 04-11-2004 01:55 AM
about the use of a string in an ifstream statement Herv? LEBAIL C++ 6 02-09-2004 06:33 PM



Advertisments