Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > extracting text data in the presence of a "look-up" file: Is it possible?

Reply
Thread Tools

extracting text data in the presence of a "look-up" file: Is it possible?

 
 
Vumani Dlamini
Guest
Posts: n/a
 
      01-07-2004
This problem follows up on a couple of problems I sent to the list 2
months back. The data is structured as follows;

##### data #########
Area=3706
Company=101
PROPdes=1 # description/type of property
PROPpri=2 # public/private
PROPemp=54 # number of employees
PROPdes=6
PROPpri=2
PROPemp=23
Company=106
PROPdes=4
PROPpri=2
PROPemp=56
Area=3709
Company=116
PROPdes=9
PROPpri=1
PROPemp=200
###################

And the data set created is;
3706|101|1|2|054
3706|101|6|2|023
3706|106|4|2|056
3709|116|9|1|200

using the following Perl script;
##### Perl script ######
use strict;
use warnings;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
my ($Area , $Comp, $Pdes, $Ppri, $Pemp);
open PRIVATE, ">c:/.../private.txt";
while (<DATA>){
if (/Area=(\d+)/) {
$Area = $1;
}
elsif (/Company=(\d+)/) {
$Comp = $1;
}
elsif (/PROPdes=(\d+)/) {
$Pdes = $1;
}
elsif (/PROPpri=(\d+)/) {
$Ppri = $1;
}
elsif (/PROPemp=(\d+)/) {
print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
}
}
}
##### Perl script ######

I now have a "area text file" with specific companies that have to be
extracted, with each row in the "area text file" having a code for an
area. I would like to extract companies only in areas listed in the
"area text file".

If within the areas in the "area text file" I am only interested in
areas with more than 10 companies, is it possible to write a script
which utilizes all this information?

Thanks al lot, again.

Vumani Dlamini

PS: My previous posts related to this problem can be found here:
http://groups.google.nl/groups?hl=nl...ing.google.com
http://groups.google.nl/groups?q=vum...ing.google.com
http://groups.google.nl/groups?q=vum...ing.google.com













http://groups.google.nl/groups?hl=nl...ing.google.com
http://groups.google.nl/groups?q=vum...ing.google.com
http://groups.google.nl/groups?q=vum...ing.google.com
 
Reply With Quote
 
 
 
 
Tore Aursand
Guest
Posts: n/a
 
      01-07-2004
On Wed, 07 Jan 2004 11:43:09 -0800, Vumani Dlamini wrote:
> [...]
> And the data set created is;
> 3706|101|1|2|054
> 3706|101|6|2|023
> 3706|106|4|2|056
> 3709|116|9|1|200
>
> [...]
>
> I now have a "area text file" with specific companies that have to be
> extracted, with each row in the "area text file" having a code for an
> area. I would like to extract companies only in areas listed in the
> "area text file".
>
> If within the areas in the "area text file" I am only interested in
> areas with more than 10 companies, is it possible to write a script
> which utilizes all this information?


If I understand your problem correctly, you could use a hash to do that;

my %areas = ();
while ( <DATA> ) {
chomp;
my ($area, $company, @tmp) = split( /\Q|\E/ );
push( @{$areas{$area}}, $company );
}

foreach ( keys %areas ) {
if ( @{$areas{$_}} > 10 ) {
print "Area $_ has more than 10 companies\n";
}
}


--
Tore Aursand <>
"Writing is a lot like sex. At first you do it because you like it.
Then you find yourself doing it for a few close friends and people you
like. But if you're any good at all, you end up doing it for money."
-- Unknown
 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      01-07-2004
Vumani Dlamini <> wrote:

> This problem follows up on a couple of problems I sent to the list 2

^^^^^^^^
^^^^^^^^

This is not a mailing list.

This is a Usenet newsgroup.


> using the following Perl script;



I kinda doubt that.

The following is not a Perl script at all! It has a syntax error.

Please be careful to post your _real_ code.


> ##### Perl script ######
> use strict;
> use warnings;
> open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";

^^^
^^^
I think you meant "$!\n" instead of "$\n" there.

(if so, then why are you putting the newline there?)


> open PRIVATE, ">c:/.../private.txt";



You should always, yes *always*, check the return value from open():

open PRIVATE, '>c:/.../private.txt' or
die "could not open '>c:/.../private.txt' $!";

You did it earlier, why did you stop?


> while (<DATA>){



DATA is a special filehandle, you should choose some other name.


> if (/Area=(\d+)/) {
> $Area = $1;
> }
> elsif (/Company=(\d+)/) {
> $Comp = $1;
> }
> elsif (/PROPdes=(\d+)/) {
> $Pdes = $1;
> }
> elsif (/PROPpri=(\d+)/) {
> $Ppri = $1;
> }
> elsif (/PROPemp=(\d+)/) {
> print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
> }
> }
> }

^
^
^
What does that curly match up with?


> I now have a "area text file"



Maybe you do and maybe you don't.

If the open() failed, then there _is no_ file...


> If within the areas in the "area text file" I am only interested in
> areas with more than 10 companies, is it possible to write a script
> which utilizes all this information?



Yes.


--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Jay Tilton
Guest
Posts: n/a
 
      01-08-2004
Tore Aursand <> wrote:

: my ($area, $company, @tmp) = split( /\Q|\E/ );
^^^^^
An unusual style choice. Am I overlooking an advantage that has over
saying split( /\|/ ) or split( /[|]/ ) ?

 
Reply With Quote
 
Tore Aursand
Guest
Posts: n/a
 
      01-08-2004
On Thu, 08 Jan 2004 08:29:09 +0000, Jay Tilton wrote:
>> my ($area, $company, @tmp) = split( /\Q|\E/ );

> ^^^^^
> An unusual style choice. Am I overlooking an advantage that has over
> saying split( /\|/ ) or split( /[|]/ ) ?


No advantages that I know of. I've made my editor (FTE) highlight \Q\E in
a special way, so...


--
Tore Aursand <>
"To cease smoking is the easiset thing I ever did. I ought to know,
I've done it a thousand times." -- Mark Twain
 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      01-09-2004
On 7 Jan 2004 11:43:09 -0800, (Vumani Dlamini)
wrote:

>This problem follows up on a couple of problems I sent to the list 2
>months back. The data is structured as follows;

[snip]
>And the data set created is;

[snip]
>using the following Perl script;
>##### Perl script ######
>use strict;
>use warnings;
>open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";


Probably not a very good idea calling it "DATA": no harm done, but you
may end up needing Perl's own DATA fh first or later...

>open PRIVATE, ">c:/.../private.txt";


aren't we checking here, eh?!?


[snip]

> elsif (/PROPemp=(\d+)/) {
> print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";


This doesn't seem consistent with the "data set created" cut away from
the above paragraph.

Here's how I'd do it anyway:

#!/usr/bin/perl -l

use strict;
use warnings;

die "Usage: $0 <infile> <outfile>\n" unless @ARGV == 2;

my ($data,$priv);
open $data, '<', $_ or die "Unable to open `$_': $!\n" for shift;
open $priv, '>', $_ or die "Unable to open `$_': $!\n" for shift;
select $priv;

my %props;
while (<$data>) {
chomp;
warn("Input data mismatch"), next unless /^(\w+)=(\d+)\s*$/;
$props{$1}=$2;
if ($1 eq 'PROPemp') {
no warnings 'uninitialized';
local $,='|';
print map $props{$_},
qw/Area Company PROPdes PROPpri PROPemp/;
}
}

__END__

This is basically just as your own script. Only, IMHO, slightly more
perlish and more maintainable.

>I now have a "area text file" with specific companies that have to be
>extracted, with each row in the "area text file" having a code for an
>area. I would like to extract companies only in areas listed in the
>"area text file".


Oh, but then just add as the first statement of the 'if' block the
following line:

next unless in $props{'Area'}, @Areas;

Of course it is up to you to write a suitable 'in' sub (see a recent
thread on the subject too!) or substitute suitable code, and populate
@Areas. But that shouldn't be a problem...


Michele
--
# This prints: Just another Perl hacker,
seek DATA,15,0 and print q... <DATA>;
__END__
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
need help extracting data from a text file nephish@xit.net Python 7 11-09-2005 05:56 PM
Extracting Numerica Data Pairs from Text Box Michael Hill Javascript 5 02-15-2005 06:45 AM
Extracting text data from MS Word document Max Java 6 09-16-2004 11:01 PM
can i validate the presence of .Net framework in the OS a_srivathsan ASP .Net 2 09-08-2004 03:56 AM
Extracting Rich Text data formats from win32clipboard Trader Python 2 08-26-2003 05:36 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57