Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > NNTP Subject Parsing

Reply
Thread Tools

NNTP Subject Parsing

 
 
$_@_.%_
Guest
Posts: n/a
 
      02-05-2004
Does anyone know where i could find some information
about parsing NNTP subject fields?

Psuedo Code and/or RegExp advise would be ideal.

Im looking to parse out multipart messages.
ie: Test Subject (1/1) - file.bin [01/10]
Another test.bin (1/2)

Then store them untill all the parts have been gathered.

Thanks any advice is appreciated.
 
Reply With Quote
 
 
 
 
Walter Roberson
Guest
Posts: n/a
 
      02-05-2004
In article <IBwUb.13166$(E-Mail Removed)>, <$_@_.%_> wrote:
oes anyone know where i could find some information
:about parsing NNTP subject fields?

suedo Code and/or RegExp advise would be ideal.

:Im looking to parse out multipart messages.
:ie: Test Subject (1/1) - file.bin [01/10]
: Another test.bin (1/2)

:Then store them untill all the parts have been gathered.

There is no standard formatting for multipart messages.

When I did this a couple of years ago, I had to just look to see what
was coming down and tweak it from time to time. As I recall, there were
some complications involving pasting the binaries back together again
automatically, due to the different ways that posters had of storing
the binaries. And there are complications around detecting duplicates
because people tend to use similar subjects for different binaries.

I probably still have the code around. I haven't looked at it in
years. It's probably not my best code, but it worked.
--
Ceci, ce n'est pas une idée.
 
Reply With Quote
 
 
 
 
Chris Mattern
Guest
Posts: n/a
 
      02-05-2004
$_@_.%_ wrote:
> Does anyone know where i could find some information
> about parsing NNTP subject fields?


How do you parse something that's freeform text?

Chris Mattern

 
Reply With Quote
 
$_@_.%_
Guest
Posts: n/a
 
      02-05-2004
http://www.velocityreviews.com/forums/(E-Mail Removed)-cnrc.gc.ca (Walter Roberson) Wrote:
> In article <IBwUb.13166$(E-Mail Removed)>, <$_@_.%_> wrote:
> oes anyone know where i could find some information
> :about parsing NNTP subject fields?
>
> suedo Code and/or RegExp advise would be ideal.
>
> :Im looking to parse out multipart messages.
> :ie: Test Subject (1/1) - file.bin [01/10]
> : Another test.bin (1/2)
>
> :Then store them untill all the parts have been gathered.
>
> There is no standard formatting for multipart messages.


Nod the standard gives alot of freedom to the poster.
>
> When I did this a couple of years ago, I had to just look to see what
> was coming down and tweak it from time to time. As I recall, there were
> some complications involving pasting the binaries back together again
> automatically, due to the different ways that posters had of storing
> the binaries. And there are complications around detecting duplicates
> because people tend to use similar subjects for different binaries.
>
> I probably still have the code around. I haven't looked at it in
> years. It's probably not my best code, but it worked.


I am very happy to hear from someone who has experience with
this sort of function, you help is really helpfull.. thank you.
>


Here is the regex im thinking about using:
m/(.+)([(\[\{]+?\d+[/-]+?(\d+)[)\]\}]+?)/

Dose this regex look ok?

There are three memory groups
1) the main subject text
2) the proof that this is part of a multi-part message
3) the number of parts for this message

Im planning on creating a hash which has the message-ids for keys
and an array ref as a value, the actual array may contain the total number
of parts expected, and which part that this message id is.

if this regex is ok, I will still need to find a way to know when all parts have
been gathered, then pass the message id's in the correct order to the hash
which populates the Tk::HList, which displays the messages.

Then if the message is selected for download i will pass the message-ids to..
Convert-BulkDecoder

Im still trying to get my head around this.. more to follow (hopefully)

Help would be greatly appreciated.
Thanks in advance for any tips/suggestions/psudo code/regex advice.
 
Reply With Quote
 
Gerard Lanois
Guest
Posts: n/a
 
      02-06-2004
$_@_.%_ writes:

> Does anyone know where i could find some information
> about parsing NNTP subject fields?
>
> Psuedo Code and/or RegExp advise would be ideal.
>
> Im looking to parse out multipart messages.
> ie: Test Subject (1/1) - file.bin [01/10]
> Another test.bin (1/2)
>
> Then store them untill all the parts have been gathered.
>
> Thanks any advice is appreciated.


My program doesn't store all the parts, but it will assemble
all the parts if they happen to all be present on the server.

See http://ubh.sourceforge.net/

Here is some code which shows how ubh does this.

# untested code follows...

my $subject = 'Test Subject (1/1) - file.bin [01/10]';

# Does it look like it contains a filename with an extension?
if ($subject =~ /\b(.+\.(\w+))\b/) {

# Is it multipart? [x/y] or (x/y)
# Requires at least 2 chars in extension, this avoids
# problems with people posting with size like "10.4 Meg"
# after the filename, and matching after the .4
if ($subject =~ /^(.+\.(\w\w+))\b.*[\(\[](\d+)\/(\d+)[\)\]]/) {
my ($subject_part, $part, $total) = ($1, $3, $4);

# ... etc.
}
}


-Gerard

 
Reply With Quote
 
Peter Scott
Guest
Posts: n/a
 
      02-06-2004
In article <IBwUb.13166$(E-Mail Removed)>,
$_@_.%_ writes:
>Does anyone know where i could find some information
>about parsing NNTP subject fields?
>
>Psuedo Code and/or RegExp advise would be ideal.
>
>Im looking to parse out multipart messages.
>ie: Test Subject (1/1) - file.bin [01/10]
> Another test.bin (1/2)
>
>Then store them untill all the parts have been gathered.


Are you trying to duplicate the functionality of this:

http://linux.maruhn.com/sec/aub.html
http://yukidoke.org/~mako/projects/aub/

Written in Perl to boot.

--
Peter Scott
http://www.perldebugged.com/
*** NEW *** http//www.perlmedic.com/
 
Reply With Quote
 
$_@_.%_
Guest
Posts: n/a
 
      02-07-2004
Well ive had a look at both of those pieces of code.
And I must say that the programming is very very impressive indeed!
I've learned quite a bit looking at the examples, I thank you all
very much for the helpfull input.

I've made some progress with this, but ive run into a tricky bit.
What it is.. how do i print this HoHoA so that i can test the result?

#ToDo...combine multi-part articles
#$xover{$_}[0] #subject #$xover{$_}[4] #references
#$xover{$_}[1] #from #$xover{$_}[5] #bytes
#$xover{$_}[2] #date #$xover{$_}[6] #lines
#$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
#m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
#$1 is: subject, $2 is: part, $3 is: total parts
# (HoHoA) subject->total parts->current part, msg id

my %HoHoA;
for my $k (sort keys %xover) {
if ($xover{$k}[0] =~
m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
push @{$HoHoA{$1}{$3}}, "$2";
push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
}
}
 
Reply With Quote
 
$_@_.%_
Guest
Posts: n/a
 
      02-07-2004
> Well ive had a look at both of those pieces of code.
> And I must say that the programming is very very impressive indeed!
> I've learned quite a bit looking at the examples, I thank you all
> very much for the helpfull input.
>
> I've made some progress with this, but ive run into a tricky bit.
> What it is.. how do i print this HoHoA so that i can test the result?
>
> #ToDo...combine multi-part articles
> #$xover{$_}[0] #subject #$xover{$_}[4] #references
> #$xover{$_}[1] #from #$xover{$_}[5] #bytes
> #$xover{$_}[2] #date #$xover{$_}[6] #lines
> #$xover{$_}[3] #message-id #$xover{$_}[7] #xref:full
> #m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/
> #$1 is: subject, $2 is: part, $3 is: total parts
> # (HoHoA) subject->total parts->current part, msg id
>
> my %HoHoA;
> for my $k (sort keys %xover) {
> if ($xover{$k}[0] =~
> m/(.+)[(\[\{]+?(\d+)[\/\-]+?(\d+)[)\]\}]+?/) {
> push @{$HoHoA{$1}{$3}}, "$2";
> push @{$HoHoA{$1}{$3}}, "$xover{$k}[3]";
> }
> }
>

n/m i got it

open (FH, '> test');
for my $k1 (keys %HoHoA) {
for my $k2 (keys %{$HoHoA{$k1}}) {
print FH "subject: $k1\n";
print FH "has $k2 parts total\n";
print FH "this is the information for this subject\n";
foreach (@{$HoHoA{$k1}{$k2}}) {
print FH "$_\n"
}
print FH "\n"
}
}
close FH;


subject: Att:CHARLI 320bps[04/14] - "The Smoky Mountain Players - Smoky Moumtain Old Time Favorites - 03 - The Great Speckled Bird.mp3" yEnc
has 8 parts total
this is the information for this subject
1
<nMBUb.182379$Rc4.1349880@attbi_s54>
2
<BMBUb.184720$sv6.955576@attbi_s52>
3
<OMBUb.182381$Rc4.1350709@attbi_s54>
4
<0NBUb.182384$Rc4.1350590@attbi_s54>
5
<eNBUb.184723$sv6.954877@attbi_s52>
6
<rNBUb.182385$Rc4.1350702@attbi_s54>
7
<ENBUb.182386$Rc4.1350712@attbi_s54>
8
<QNBUb.185030$5V2.895547@attbi_s53>
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
nntp subject line Parad0x86 Java 0 02-09-2010 03:25 PM
[ANN] NNTP client Library ruby-net-nntp 1.0.0 released Anton Bangratz Ruby 0 06-04-2008 03:12 PM
Posting to nntp newsgroup with Perl (Net::NNTP) sadie-no-reply Perl Misc 3 03-05-2007 01:04 AM
Cannot authenticate to NNTP server with Net::NNTP authinfo() usenet@DavidFilmer.com Perl Misc 7 08-09-2006 06:36 PM
Add/Remove Programs Help Kinda Wierd Do Not Ignore Terrable Subject JustIgnore The Subject Oops Whatever Duh Samuel Townsend Computer Support 0 10-13-2004 12:49 AM



Advertisments