Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Parsing problems: A journey from a text file to a directory tree

Reply
Thread Tools

Parsing problems: A journey from a text file to a directory tree

 
 
Martin M.
Guest
Posts: n/a
 
      09-16-2007
Hi everybody,

Some of my colleagues want me to write a script for easy folder and
subfolder creation on the Mac.

The script is supposed to scan a text file containing directory trees
in the following format:

[New client]
|-Invoices
|-Offers
|--Denied
|--Accepted
|-Delivery notes

As you can see, the folder hierarchy is expressed by the amounts of
minuses, each section header framed by brackets (like in Windows
config files).

After the scan process, the script is supposed to show a dialog, where
the user can choose from the different sections (e.g. 'Alphabet',
'Months', 'New client' etc.). Then the script will create the
corresponding folder hierarchy in the currently selected folder (done
via AppleScript).

But currently I simply don't know how to parse these folder lists and
how to save them in an array accordingly.

First I thought of an array like this:

dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
'Accpeted': {}}, 'Delivery notes': {}}}

But this doesn't do the trick, as I also have to save the hierarchy
level of the current folder as well...

Argh, I really don't get my head around this problem and I need your
help. I have the feeling, that the answer is not that complicated, but
I just don't get it right now...

Your desperate,

Martin

 
Reply With Quote
 
 
 
 
Neil Cerutti
Guest
Posts: n/a
 
      09-16-2007
On 2007-09-16, Martin M. <> wrote:
> Hi everybody,
>
> Some of my colleagues want me to write a script for easy folder and
> subfolder creation on the Mac.
>
> The script is supposed to scan a text file containing directory trees
> in the following format:
>
> [New client]
>|-Invoices
>|-Offers
>|--Denied
>|--Accepted
>|-Delivery notes


Would it make sense to store it like this?

[('New client',
[('Invoices', []),
('Offers', [('Denied', []), ('Accepted', [])]),
('Delivery notes', [])]]

> First I thought of an array like this:
>
> dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
> 'Accpeted': {}}, 'Delivery notes': {}}}


A dictionary approach is fine if it's OK for the directories to
be unordered, which doesn't appear to be the case.

> But this doesn't do the trick, as I also have to save the
> hierarchy level of the current folder as well...


The above does store the hierarchy, as the number of nesting
levels.

ditreedb['New Client']['Offers']['Denied']

--
Neil Cerutti
 
Reply With Quote
 
 
 
 
Larry Bates
Guest
Posts: n/a
 
      09-17-2007
Since you are going to need to do a dialog, I would use wxWindows tree
control. It already knows how to do what you describe. Then you can
just walk all the branches and create the folders.

-Larry

Martin M. wrote:
> Hi everybody,
>
> Some of my colleagues want me to write a script for easy folder and
> subfolder creation on the Mac.
>
> The script is supposed to scan a text file containing directory trees
> in the following format:
>
> [New client]
> |-Invoices
> |-Offers
> |--Denied
> |--Accepted
> |-Delivery notes
>
> As you can see, the folder hierarchy is expressed by the amounts of
> minuses, each section header framed by brackets (like in Windows
> config files).
>
> After the scan process, the script is supposed to show a dialog, where
> the user can choose from the different sections (e.g. 'Alphabet',
> 'Months', 'New client' etc.). Then the script will create the
> corresponding folder hierarchy in the currently selected folder (done
> via AppleScript).
>
> But currently I simply don't know how to parse these folder lists and
> how to save them in an array accordingly.
>
> First I thought of an array like this:
>
> dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
> 'Accpeted': {}}, 'Delivery notes': {}}}
>
> But this doesn't do the trick, as I also have to save the hierarchy
> level of the current folder as well...
>
> Argh, I really don't get my head around this problem and I need your
> help. I have the feeling, that the answer is not that complicated, but
> I just don't get it right now...
>
> Your desperate,
>
> Martin
>

 
Reply With Quote
 
Michael J. Fromberger
Guest
Posts: n/a
 
      09-18-2007
In article <. com>,
"Martin M." <> wrote:

> Hi everybody,
>
> Some of my colleagues want me to write a script for easy folder and
> subfolder creation on the Mac.
>
> The script is supposed to scan a text file containing directory trees
> in the following format:
>
> [New client]
> |-Invoices
> |-Offers
> |--Denied
> |--Accepted
> |-Delivery notes
>
> As you can see, the folder hierarchy is expressed by the amounts of
> minuses, each section header framed by brackets (like in Windows
> config files).
>
> After the scan process, the script is supposed to show a dialog, where
> the user can choose from the different sections (e.g. 'Alphabet',
> 'Months', 'New client' etc.). Then the script will create the
> corresponding folder hierarchy in the currently selected folder (done
> via AppleScript).
>
> But currently I simply don't know how to parse these folder lists and
> how to save them in an array accordingly.
>
> First I thought of an array like this:
>
> dirtreedb = {'New client': {'Invoices': {}, 'Offers': {'Denied': {},
> 'Accpeted': {}}, 'Delivery notes': {}}}
>
> But this doesn't do the trick, as I also have to save the hierarchy
> level of the current folder as well...
>
> Argh, I really don't get my head around this problem and I need your
> help. I have the feeling, that the answer is not that complicated, but
> I just don't get it right now...


Hello, Martin,

A good way to approach this problem is to recognize that each section of
your proposed configuration represents a kind of depth-first traversal
of the tree structure you propose to create. Thus, you can reconstruct
the tree by keeping track at all times of the path from the "root" of
the tree to the "current location" in the tree.

Below is one possible implementation of this idea in Python. In short,
the function keeps track of a stack of dictionaries, each of which
represents the contents of some directory in your hierarchy. As you
encounter "|--" lines, entries are pushed to or popped from the stack
according to whether the nesting level has increased or decreased.

This code is not heavily tested, but hopefully it should be clear:

..import re
..
..def parse_folders(input):
.. """Read input from a file-like object that describes directory
.. structures to be created. The input format is:
..
.. [Top-level name]
.. |-Subdirectory1
.. |--SubSubDirectory1
.. |--SubSubDirectory2
.. |---SubSubSubDirectory1
.. |-Subdirectory2
.. |-Subdirectory3
..
.. The input may consist of any number of such groups. The result is
.. a dictionary structure in which each key names a directory, and
.. the corresponding value is a dictionary structure showing the
.. contents of that directory, possibly empty.
.. """
..
.. # This expression matches "header" lines, defining a new section.
.. new_re = re.compile(r'\[([\w ]+)\]\s*$')
..
.. # This expression matches "nesting" lines, defining subdirectories.
.. more_re = re.compile(r'(\|-+)([\w ]+)$')
..
.. out = {} # Root: Maps section names to subtrees.
.. state = [out] # Stack of dictionaries, current path.
..
.. for line in input:
.. m = new_re.match(line)
.. if m: # New section begins here...
.. key = m.group(1).strip()
.. out[key] = {}
.. state = [out, out[key]]
.. continue
..
.. m = more_re.match(line)
.. if m: # Add a directory to an existing section
.. assert state
..
.. new_level = len(m.group(1))
.. key = m.group(2).strip()
..
.. while new_level < len(state):
.. state.pop()
..
.. state[-1][key] = {}
.. state.append(state[-1][key])
..
.. return out

To call this, pass a file-like object to parse_folders(), e.g.:

test1 = '''
[New client].
|-Invoices
|-Offers
|--Denied
|--Accepted
|---Reasons
|---Rhymes
|-Delivery notes
'''

from StringIO import StringIO
result = parse_folders(StringIO(test1))

As the documentation suggests, the result is a nested dictionary
structure, representing the folder structure you encoded. I hope this
helps.

Cheers,
-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      09-18-2007
On Sep 19, 4:51 am, "Michael J. Fromberger"
<Michael.J.Fromber...@Clothing.Dartmouth.EDU> wrote:
> .
> . # This expression matches "header" lines, defining a new section.
> . new_re = re.compile(r'\[([\w ]+)\]\s*$')


Directory names can contain more different characters than those which
match [\w ] ... and which ones depends on the OS; might as well just
allow anything, and leave it to the OS to complain. Also consider
using line.rstrip() (usually a handy precaution on ANY input text
file) instead of having \s*$ at the end of your regex.

> .
> . while new_level < len(state):
> . state.pop()


Hmmm ... consider rewriting that as the slightly less obfuscatory

while len(state) > new_level:
state.pop()

If you really want to make the reader slow down and think, try this:

del state[new_level:]

A warning message if there are too many "-" characters might be a good
idea:

[foo]
|-bar
|-zot
|---plugh

> .
> . state[-1][key] = {}
> . state.append(state[-1][key])
> .


And if the input line matches neither regex?

> . return out
>
> To call this, pass a file-like object to parse_folders(), e.g.:
>
> test1 = '''
> [New client].


Won't work with the dot on the end.

> Michael J. Fromberger | Lecturer, Dept. of Computer Science



 
Reply With Quote
 
Michael J. Fromberger
Guest
Posts: n/a
 
      09-19-2007
Hi, John,

Your comments below are all reasonable. However, I would like to point
out that the purpose of my example was to provide a demonstration of an
algorithm, not an industrial-grade solution to every aspect of the
original poster's problem. I am confident the original poster can deal
with these aspects of his problem space on his own.

In article <. com>,
John Machin <> wrote:

> [...]
> > . while new_level < len(state):
> > . state.pop()

>
> Hmmm ... consider rewriting that as the slightly less obfuscatory
>
> while len(state) > new_level:
> state.pop()


This seems to me to be an aesthetic consideration only; I'm not sure I
understand your rationale for reversing the sense of the comparison.
Since it does not change the functionality, it's hardly worthy of
complaint, but I don't see any improvement, either.

> A warning message if there are too many "-" characters might be a good
> idea:
>
> [foo]
> |-bar
> |-zot
> |---plugh


Perhaps so. Again, the original poster will have to decide what should
be the correct response to input of this sort; at present, the
implementation is tolerant of such variations, without loss of
generality.

> And if the input line matches neither regex?


I believe it should be clear that such lines are ignored. Again, this
is an opportunity for the original poster to determine an alternative
response -- perhaps an exception could be raised, if that is his desire.
The problem specification did not constrain this case.

> > To call this, pass a file-like object to parse_folders(), e.g.:
> >
> > test1 = '''
> > [New client].

>
> Won't work with the dot on the end.


My mistake. The period was a copy-and-paste artifact, which I missed.

Cheers,
-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DVD Verdict reviews: STAR WARS: A MUSICAL JOURNEY, ED, EDD 'N' EDDY: SEASON 1, VOLUME 1, and more! DVD Verdict DVD Video 0 05-19-2005 08:11 AM
B tree, B+ tree and B* tree Stub C Programming 3 11-12-2003 01:51 PM
Bruce Lee: Warrior's Journey Wade365 DVD Video 0 10-27-2003 10:06 PM
Canon 10D: The Journey To Focus Freedom Bob Pattinson Digital Photography 1 10-26-2003 06:05 AM
Long Day's Journey Into Printers . . . PTRAVEL Digital Photography 6 08-03-2003 03:30 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57