>On 28 juil, 03:12, xhos...@gmail.com wrote:
> Kimia <chemies...@gmail.com> wrote:
> > hi, girls and dudes,
>
> > ....I doubt whether hash might leak when it comprises of a large
> > amount of pairs.
> > Recently I have been asked to do some statitic work over large
> > files. All I wanted to do is to find the duplicated lines of a file
> > and I wrote the snippet as below:
> > code:
> > mysort.pl
> > ------------------------
> > #!/usr/bin/perl
>
> > use strict;
> > use warnings;
> > my %in;
> > my $cnt = 0;
> > while(<>){
> > chomp;
> > $_ or ++$cnt, next;
> > ++$in{$_};
> > }
> > foreach(sort keys %in){
> > $cnt += $in{$_};
> > print "$_*$in{$_}\n";
> > }
>
> What is the stuff with $cnt?
>
>
>
> > However, when I used it for a large file, which contains 10M lines, it
> > failed.
>
> It doesn't fail. I gives you output you didn't expect.
>
>
>
>
>
> > $ ./mysort <TenLinesInput.dat >out
> > $ echo $?
> > 0
> > $ tail out -n 5
> > ------------------------
> > ??????????????*2
> > ????????????????*1
> > ??????????????????*1
> > ?????????????????????????????*2834
> > ?????????????????????????????????????????????????? ???????????????????????
> > ?????????????????????????????????????????????????? ???????????????????????
> > ?????????????????????????????????????????????????? ???????????????????????
> > ????????????????????? *1
> > ------------------------
> > Where '?' is \0xff, when viewed as binary file.
> > I'm sure that the input contains no char as: \0xff.
>
> I am not sure of that. Try this and see what it gives, and if
> it consistently gives the same thing:
>
> perl -lne 'print $. unless -1==index $_, chr(0xff)' TenLinesInput.dat
>
> > Most of lines
> > are tens of char long, few exceeds 100 and none exceeds 1000.
> > The other output lines, except last 10, all are as expected.
>
> > Then I tried it for a input file conprised of one million lines
> > and it failed with the same error;
>
> It didn't fail with an error. The value of $? shows that. (And I don't
> see anything suggestive of a "leak", either.) It seems like what it comes
> down to is that you and Perl disagree over what is in your file.
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> Usenet Newsgroup Service $9.95/Month 30GB
thanks, xho. I've found the bug, which, of course, I've made.
The output file is perfectly correct. The input file does contains
lines
of ????.
Before debugging, I have tryed with:
$perl -lne 'print if /^\0xff/'
and the output was none. Then I assured myself with the assumption.
However, the regex should be : /^\xff/
It was part of the volumnious log-file processing that I was asked
to do.
\0xff should not exist in normal encoding and should be generated in
some
uncertain situation.
The code that I posted was written for debugging when I found
exceptions in
other processing. However, I did not succeed in it, and it was so
stupid~
Befor debugging would expel error, it does import stupidness

Thanks for all your help.
ps:
> perl -lne 'print $. unless -1==index $_, chr(0xff)' TenLinesInput.dat
I tried this lines and it does help me.
--
fous, c'est un mot qu'on dirait invent'e pour nous.