Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: trigraphs, yecch

Reply
Thread Tools

Re: trigraphs, yecch

 
 
Keith Thompson
Guest
Posts: n/a
 
      01-31-2012
http://www.velocityreviews.com/forums/(E-Mail Removed) (Richard Harter) writes:
> The other day I wanted to put three successive question marks in a
> string. Frex, "(???).%s.%s". My trusty antique gcc compiler
> converted the last two '?'s into a ']' along with a warning that it
> was doing a trigraph conversion.
>
> Now, me, I know from nothing about trigraphs - never used them, hope
> never to use them - so I was caught by surprise. My elderly copy of
> K&R described them but I didn't see anything about getting around
> them.
>
> So. How is one supposed to get three successive question marks into a
> string?


For reference, there are exactly 9 trigraphs. C99 5.2.1.1:

All occurrences in a source file of the following sequences of three
characters (called trigraph sequences) are replaced with the
corresponding single character.

??= # ??) ] ??! |
??( [ ??' ^ ??> }
??/ \ ??< { ??- ~

No other trigraph sequences exist. Each ? that does not begin one of
the trigraphs listed above is not changed.

In your case, "???" is not a trigraph, but ??) is.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
 
Patrick Scheible
Guest
Posts: n/a
 
      02-01-2012
Keith Thompson <(E-Mail Removed)> writes:

> (E-Mail Removed) (Richard Harter) writes:
>> The other day I wanted to put three successive question marks in a
>> string. Frex, "(???).%s.%s". My trusty antique gcc compiler
>> converted the last two '?'s into a ']' along with a warning that it
>> was doing a trigraph conversion.
>>
>> Now, me, I know from nothing about trigraphs - never used them, hope
>> never to use them - so I was caught by surprise. My elderly copy of
>> K&R described them but I didn't see anything about getting around
>> them.
>>
>> So. How is one supposed to get three successive question marks into a
>> string?

>
> For reference, there are exactly 9 trigraphs. C99 5.2.1.1:
>
> All occurrences in a source file of the following sequences of three
> characters (called trigraph sequences) are replaced with the
> corresponding single character.
>
> ??= # ??) ] ??! |
> ??( [ ??' ^ ??> }
> ??/ \ ??< { ??- ~
>
> No other trigraph sequences exist. Each ? that does not begin one of
> the trigraphs listed above is not changed.
>
> In your case, "???" is not a trigraph, but ??) is.


Has the C standards committee been flamed enough for trigraphs? I don't
think so. I bet there's 1000 programmers bitten by them for every 1
who's been helped. By 1989 anyone who didn't have an editor capable of
typing any 7-bit ASCII character was a hopeless luddite who would be
better served making their own pre-preprocessing phase for C and leaving
the language unchanged instead of another phase in the preprocessor for
every C program forever and ever.

-- Patrick

 
Reply With Quote
 
 
 
 
Keith Thompson
Guest
Posts: n/a
 
      02-01-2012
Patrick Scheible <(E-Mail Removed)> writes:
[...]
> Has the C standards committee been flamed enough for trigraphs? I don't
> think so. I bet there's 1000 programmers bitten by them for every 1
> who's been helped. By 1989 anyone who didn't have an editor capable of
> typing any 7-bit ASCII character was a hopeless luddite who would be
> better served making their own pre-preprocessing phase for C and leaving
> the language unchanged instead of another phase in the preprocessor for
> every C program forever and ever.


By 1989, there were plenty of C programmers using EBCDIC-based systems.
There probably still are, though not as many. I understand there were
also some C programmers using keyboards and character sets that replaced
some ASCII punctuation characters with accented letters; that's probably
not as much of a concern these days.

I would have preferred a solution in which trigraphs are disabled by
default, and can be enabled explicitly by a directive at the top of each
source file.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      02-01-2012
On 01/31/2012 08:01 PM, Patrick Scheible wrote:
> Keith Thompson <(E-Mail Removed)> writes:

....
>> For reference, there are exactly 9 trigraphs. C99 5.2.1.1:
>>
>> All occurrences in a source file of the following sequences of three
>> characters (called trigraph sequences) are replaced with the
>> corresponding single character.
>>
>> ??= # ??) ] ??! |
>> ??( [ ??' ^ ??> }
>> ??/ \ ??< { ??- ~
>>
>> No other trigraph sequences exist. Each ? that does not begin one of
>> the trigraphs listed above is not changed.
>>
>> In your case, "???" is not a trigraph, but ??) is.

>
> Has the C standards committee been flamed enough for trigraphs? I don't
> think so. I bet there's 1000 programmers bitten by them for every 1
> who's been helped. By 1989 anyone who didn't have an editor capable of
> typing any 7-bit ASCII character ...


Every single one of the characters for which there's a corresponding
trigraph sequence, is on that list precisely it's not in the "invariant
set" supported by all of the national variants of the ISO/IEC 646 7-bit
encoding. If you lived in one of the nations for which those national
variants was created, an editor which supported only strict 7-bit ASCII
would not have been very useful - that's precisely why the national
variants were created.

There are work-arounds; such as standard conventions for transliterating
a German letter such as ö into oe, but those conventions implement
essentially the same idea as trigraphs: using multiple representable
characters to indirectly represent a single character that's not
directly representable.

> ... was a hopeless luddite who would be
> better served making their own pre-preprocessing phase for C and leaving
> the language unchanged instead of another phase in the preprocessor for
> every C program forever and ever.


I'm not arguing that trigraphs were a good solution to the problem. But
restricting editors to 7-bit ASCII was not an acceptable solution,
either. That's why one of the Scandinavian countries (I forget which
one) refused to approve the C standard until some accomodation was made
to the needs of people for whom 7-bit ASCII was unacceptable. The result
was a political compromise, and like most such, it was equally
unattractive to all parties engaged in the negotiations.
--
James Kuyper
 
Reply With Quote
 
Charles Richmond
Guest
Posts: n/a
 
      02-01-2012
"Patrick Scheible" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Keith Thompson <(E-Mail Removed)> writes:
>
>> (E-Mail Removed) (Richard Harter) writes:
>>> The other day I wanted to put three successive question marks in a
>>> string. Frex, "(???).%s.%s". My trusty antique gcc compiler
>>> converted the last two '?'s into a ']' along with a warning that it
>>> was doing a trigraph conversion.
>>>
>>> Now, me, I know from nothing about trigraphs - never used them, hope
>>> never to use them - so I was caught by surprise. My elderly copy of
>>> K&R described them but I didn't see anything about getting around
>>> them.
>>>
>>> So. How is one supposed to get three successive question marks into a
>>> string?

>>
>> For reference, there are exactly 9 trigraphs. C99 5.2.1.1:
>>
>> All occurrences in a source file of the following sequences of three
>> characters (called trigraph sequences) are replaced with the
>> corresponding single character.
>>
>> ??= # ??) ] ??! |
>> ??( [ ??' ^ ??> }
>> ??/ \ ??< { ??- ~
>>
>> No other trigraph sequences exist. Each ? that does not begin one of
>> the trigraphs listed above is not changed.
>>
>> In your case, "???" is not a trigraph, but ??) is.

>
> Has the C standards committee been flamed enough for trigraphs? I don't
> think so. I bet there's 1000 programmers bitten by them for every 1
> who's been helped. By 1989 anyone who didn't have an editor capable of
> typing any 7-bit ASCII character was a hopeless luddite who would be
> better served making their own pre-preprocessing phase for C and leaving
> the language unchanged instead of another phase in the preprocessor for
> every C program forever and ever.
>


So just *remove* trigraphs from the *next* C standard. Or require the
preprocessor symbol "__TRIGRAPH_IMPLEMENTED__" to be defined, or relegate
this to a PRAGMA... before trigraphs would be recognized by the compiler.
Make it *extra* hard for the ordinary C programmer to step on this land
mine.


--
+<><><><><><><><><><><><><><><><><><><>+
| Charles Richmond (E-Mail Removed) |
+<><><><><><><><><><><><><><><><><><><>+

 
Reply With Quote
 
Joachim Schmitz
Guest
Posts: n/a
 
      02-01-2012
James Kuyper wrote:
<snip>
> There are work-arounds; such as standard conventions for
> transliterating a German letter such as ö into oe, but those
> conventions implement essentially the same idea as trigraphs: using
> multiple representable characters to indirectly represent a single
> character that's not directly representable.


Not quite the same thing. while e.g. the German ö can get transcribed as oe,
not every oe can get changed to ö, so the process is not reversible.

Bye, Jojo

 
Reply With Quote
 
Kaz Kylheku
Guest
Posts: n/a
 
      02-01-2012
On 2012-02-01, James Kuyper <(E-Mail Removed)> wrote:
> I'm not arguing that trigraphs were a good solution to the problem. But
> restricting editors to 7-bit ASCII was not an acceptable solution,
> either. That's why one of the Scandinavian countries (I forget which
> one) refused to approve the C standard until some accomodation was made
> to the needs of people for whom 7-bit ASCII was unacceptable.


The correct response would have been not to bother with them. There doesn't
need to be an ISO standard for C; ANSI is good enough. Many fine languages have
only ANSI standards.

Actual programmers in that Scandinavian country do not require these trigraphs
any more than they require words like "if", "for" and "unsigned" to be
translated into their language, and were just as ill-served by this as everyone
else.
 
Reply With Quote
 
ralph
Guest
Posts: n/a
 
      02-01-2012
On Tue, 31 Jan 2012 23:50:30 -0600, "Charles Richmond"
<(E-Mail Removed)> wrote:

>"Patrick Scheible" <(E-Mail Removed)> wrote in message
>news:(E-Mail Removed)...
>> Keith Thompson <(E-Mail Removed)> writes:
>>
>>> (E-Mail Removed) (Richard Harter) writes:
>>>> The other day I wanted to put three successive question marks in a
>>>> string. Frex, "(???).%s.%s". My trusty antique gcc compiler
>>>> converted the last two '?'s into a ']' along with a warning that it
>>>> was doing a trigraph conversion.
>>>>
>>>> Now, me, I know from nothing about trigraphs - never used them, hope
>>>> never to use them - so I was caught by surprise. My elderly copy of
>>>> K&R described them but I didn't see anything about getting around
>>>> them.
>>>>
>>>> So. How is one supposed to get three successive question marks into a
>>>> string?
>>>
>>> For reference, there are exactly 9 trigraphs. C99 5.2.1.1:
>>>
>>> All occurrences in a source file of the following sequences of three
>>> characters (called trigraph sequences) are replaced with the
>>> corresponding single character.
>>>
>>> ??= # ??) ] ??! |
>>> ??( [ ??' ^ ??> }
>>> ??/ \ ??< { ??- ~
>>>
>>> No other trigraph sequences exist. Each ? that does not begin one of
>>> the trigraphs listed above is not changed.
>>>
>>> In your case, "???" is not a trigraph, but ??) is.

>>
>> Has the C standards committee been flamed enough for trigraphs? I don't
>> think so. I bet there's 1000 programmers bitten by them for every 1
>> who's been helped. By 1989 anyone who didn't have an editor capable of
>> typing any 7-bit ASCII character was a hopeless luddite who would be
>> better served making their own pre-preprocessing phase for C and leaving
>> the language unchanged instead of another phase in the preprocessor for
>> every C program forever and ever.
>>

>
>So just *remove* trigraphs from the *next* C standard. Or require the
>preprocessor symbol "__TRIGRAPH_IMPLEMENTED__" to be defined, or relegate
>this to a PRAGMA... before trigraphs would be recognized by the compiler.
>Make it *extra* hard for the ordinary C programmer to step on this land
>mine.


Just to add to the pile don't all current mainstream* compilers
already offer options to toggle trigraph (and digraph) parsing?

And have for quite awhile now. For example, no trigraph parsing has
been the default for MSC since MSC 5.1 (1985?) or at least MSC 6
(v12).

[* by mainstream I meant major popular "3rd party" compilers and
developer packages, and not the compilers often included with O/Ss,
particularily in the UNIX world.]

-ralph
 
Reply With Quote
 
Patrick Scheible
Guest
Posts: n/a
 
      02-01-2012
Keith Thompson <(E-Mail Removed)> writes:

> Patrick Scheible <(E-Mail Removed)> writes:
> [...]
>> Has the C standards committee been flamed enough for trigraphs? I don't
>> think so. I bet there's 1000 programmers bitten by them for every 1
>> who's been helped. By 1989 anyone who didn't have an editor capable of
>> typing any 7-bit ASCII character was a hopeless luddite who would be
>> better served making their own pre-preprocessing phase for C and leaving
>> the language unchanged instead of another phase in the preprocessor for
>> every C program forever and ever.

>
> By 1989, there were plenty of C programmers using EBCDIC-based systems.


I would argue with "plenty". C was not a popular language in IBM shops.
IBM shops typically didn't do systems programing of the sort C is great
for, and if they did they'd probably use PL/1. By 1989, EBCDIC was
clearly on its way out. The PC used ASCII. People in shops that used
EBCDIC mostly already had tools to convert to ASCII when needed.

> There probably still are, though not as many. I understand there were
> also some C programmers using keyboards and character sets that replaced
> some ASCII punctuation characters with accented letters; that's probably
> not as much of a concern these days.


I think that was the standards committee's motivation. However, a
crippled subset of ASCII was not a satisfactory approach to anyone.
People in countries using those accented characters still needed
brackets and braces sometimes. They pretty often needed to represent
multiple languages with accented characters in the same document and a
national character set doesn't allow that at all.

> I would have preferred a solution in which trigraphs are disabled by
> default, and can be enabled explicitly by a directive at the top of each
> source file.


They should have stuck to using a standalone independent preprocessor
just for character set issues for another few years until Unicode came
along. In 1991, the first volume of the Unicode standard was
published.

-- Patrick
 
Reply With Quote
 
Patrick Scheible
Guest
Posts: n/a
 
      02-01-2012
James Kuyper <(E-Mail Removed)> writes:

> On 01/31/2012 08:01 PM, Patrick Scheible wrote:
>> Keith Thompson <(E-Mail Removed)> writes:

> ...
>>> For reference, there are exactly 9 trigraphs. C99 5.2.1.1:
>>>
>>> All occurrences in a source file of the following sequences of three
>>> characters (called trigraph sequences) are replaced with the
>>> corresponding single character.
>>>
>>> ??= # ??) ] ??! |
>>> ??( [ ??' ^ ??> }
>>> ??/ \ ??< { ??- ~
>>>
>>> No other trigraph sequences exist. Each ? that does not begin one of
>>> the trigraphs listed above is not changed.
>>>
>>> In your case, "???" is not a trigraph, but ??) is.

>>
>> Has the C standards committee been flamed enough for trigraphs? I don't
>> think so. I bet there's 1000 programmers bitten by them for every 1
>> who's been helped. By 1989 anyone who didn't have an editor capable of
>> typing any 7-bit ASCII character ...

>
> Every single one of the characters for which there's a corresponding
> trigraph sequence, is on that list precisely it's not in the "invariant
> set" supported by all of the national variants of the ISO/IEC 646 7-bit
> encoding. If you lived in one of the nations for which those national
> variants was created, an editor which supported only strict 7-bit ASCII
> would not have been very useful - that's precisely why the national
> variants were created.


The invariant subset of ASCII was an unsatisfactory solution to
everyone, including people working in the languages for which it was
created. The characters lost by accented characters are used for many
purposes, not just C. And many people need to be able to represent
accented characters from several different languages in the same
document. The Unicode committe was already meeting in 1989. The C
committee should have stuck to ad-hoc preprocessors for people using C
in a non-full-ASCII language until Unicode was ready, only another few
years.

> There are work-arounds; such as standard conventions for transliterating
> a German letter such as ö into oe, but those conventions implement
> essentially the same idea as trigraphs: using multiple representable
> characters to indirectly represent a single character that's not
> directly representable.
>
>> ... was a hopeless luddite who would be
>> better served making their own pre-preprocessing phase for C and leaving
>> the language unchanged instead of another phase in the preprocessor for
>> every C program forever and ever.

>
> I'm not arguing that trigraphs were a good solution to the problem. But
> restricting editors to 7-bit ASCII was not an acceptable solution,
> either. That's why one of the Scandinavian countries (I forget which
> one) refused to approve the C standard until some accomodation was made
> to the needs of people for whom 7-bit ASCII was unacceptable. The result
> was a political compromise, and like most such, it was equally
> unattractive to all parties engaged in the negotiations.


Thankfully, Ritchie didn't have get an international committee's
blessing in order to create C.

-- Patrick
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: trigraphs, yecch Ben Bacarisse C Programming 9 02-01-2012 01:27 PM
Re: trigraphs, yecch Peter Nilsson C Programming 7 02-01-2012 12:15 PM



Advertisments