barcaroller wrote:
>
> I have a large block of memory. I need to (1) check if it contains only
> ASCII characters (including newlines and/or carriage-returns) and, if so,
> (2) extract the lines into individual C++ strings.
>
> Currently I loop the entire block (byte for byte), run isascii(byte) on
> each byte, and then call getline() (either string.getline or
> iostream.getline
> does the job). This is proving too slow. I'm sure this problem has been
> solved using more efficient methods. Any suggestions?
You could use the first pass to store information about where the lines
start and end. Something like:
/*
appends the lines in [from,to) to the_text provided, all
characters in the range are ascii. If not, no lines will
be appended.
*/
template < typename ConstCharIter, typename StringSequence >
bool append_lines ( ConstCharIter from,
ConstCharIter to,
StringSequence & the_text ) {
typedef std:

air< ConstCharIter, ConstCharIter > line;
std::deque< line > the_lines;
CharConstIter line_beg = from;
CharConstIter line_end = line_beg;
while ( true ) {
if ( line_end == to ) {
the_lines.push_back( line( line_beg, line_end ) );
break;
}
if ( *line_end == '\n' ) {
the_text.push_back( line( line_beg, line_end ) );
++line_end;
line_beg = line_end;
continue;
}
if ( ! isascii( *line_end ) ) {
return ( false );
}
++ line_end;
}
for ( std::deque< line >::const_iterator line_iter = the_lines.begin();
line_iter != the_lines.end(); ++ line_iter ) {
// prematurely optimizing away a copy-constructor that might
// be elided by the implementation anyway:
// the_text.push_back
// ( std::string( line_iter->first, line_iter->second ) );
the_text.push_back( std::string() );
the_text.back().swap
( std::string( line_iter->first, line_iter->second ) );
}
return ( true );
}
Note: code not touched by a compiler.
Also: if it is expected that non-ascii characters only occur with negligible
probability, you might be able to save time by inserting the lines right
away and roll-back the transaction if you encounter a non-ascii character.
Best
Kai-Uwe Bux