Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Correct use of Unicode in RegExp (http://www.velocityreviews.com/forums/t886070-correct-use-of-unicode-in-regexp.html)

mike blamires 04-22-2004 11:23 PM

Correct use of Unicode in RegExp
 
On Thu, 22 Apr 2004 22:36:44 +0100, mike blamires scribbled furiously:

> I am having great difficulty using Unicode characters in a Regular
> Expression, I am trying to match extended Unicode characters.
>
> I am wishing to split a large Dumpfile (containing only JPEGS) I have used
> a hex editor to manually extract a file just to show it can be done, so I
> know the input is intact.
>
> Each JPEG starts with the Unicode characters \u00FF \u00D8 \u00FF \u00E1
> and there are plenty of these to be found within the file.
>
> open(DUMPFILE, "/pathtodumpfile");
> my $line;
> while(<DUMPFILE>) {
> $line = $line.$_;
> }
> @files = split(/\x{00FF}\x{00D8}\x{00FF}\x{00E1}/, $line);
>
> (As you may see from the above style I am relatively inexperienced to the
> perl side of programming ;)
>
> I have tried inserting the Unicode characters in various ways \xFF, \x{FF}
> etc. It just doesn't seem to find the pattern. I am at a bit of a loss as
> to whether it is my regexp that is wrong, my use of Unicode characters
> or use of Extended Unicode characters.
>
> many thanks for your help.
>
> cheers
> Mike


Apologies, incorrect newsgroup first time round. Please see above.
cheers
Mike





All times are GMT. The time now is 01:24 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57