Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > f python?

Reply
Thread Tools

f python?

 
 
Xah Lee
Guest
Posts: n/a
 
      04-08-2012
hi guys,

sorry am feeling a bit prolifit lately.

today's show, is: 〈**** Python〉
http://xahlee.org/comp/****_python.html

------------------------------------
**** Python
By Xah Lee, 2012-04-08

**** Python.

just ****ing spend 2 hours and still going.

here's the short story.

so recently i switched to a Windows version of python. Now, Windows
version takes path using win backslash, instead of cygwin slash. This
****ing broke my find/replace scripts that takes a dir level as input.
Because i was counting slashes.

Ok no problem. My sloppiness. After all, my implementation wasn't
portable. So, let's fix it. After a while, discovered there's the
「os.sep」. Ok, replace 「"/"」 to 「os.sep」, done. Then, bang, all hell
went lose. Because, the backslash is used as escape in string, so any
regex that manipulate path got ****ed majorly. So, now you need to
find a quoting mechanism. Then, **** python doc incomprehensible
scattered comp-sci-r-us BNF ****. Then, **** python for “os.path” and
“os” modules then string object and string functions inconsistent
ball. And **** Guido who wants to **** change python for his idiotic
OOP concept of “elegance” so that some of these are deprecated.

So after several exploration of “repr()”, “format()”, “‹str›.count()”,
“os.path.normpath()”, “re.split()”, “len(re.search().group())” etc,
after a long time, let's use “re.escape()”. 2 hours has passed. Also,
discovered that “os.path.walk” is now deprecated, and one is supposed
to use the sparkling “os.walk”. In the process of refreshing my
python, the “os.path.walk” semantics is really one ****ed up ****.
Meanwhile, the “os.walk” went into incomprehensible OOP object and
iterators ****.

now, it's close to 3 hours. This fix is supposed to be done in 10 min.
I'd have done it in elisp in just 10 minutes if not for my
waywardness.

This is Before

def process_file(dummy, current_dir, file_list):
current_dir_level = len(re.split("/", current_dir)) -
len(re.split("/", input_dir))
cur_file_level = current_dir_level+1
if min_level <= cur_file_level <= max_level:
for a_file in file_list:
if re.search(r"\.html$", a_file, re.U) and
os.path.isfile(current_dir + "/" + a_file):
replace_string_in_file(current_dir + "/" + a_file)

This is After

def process_file(dummy, current_dir, file_list):
current_dir = os.path.normpath(current_dir)
cur_dir_level = re.sub( "^" + re.escape(input_dir), "",
current_dir).count( os.sep)
cur_file_level = cur_dir_level + 1
if min_level <= cur_file_level <= max_level:
for a_file in file_list:
if re.search(r"\.html$", a_file, re.U) and
os.path.isfile(current_dir + re.escape(os.sep) + a_file):
replace_string_in_file(current_dir + os.sep + a_file)
# print "%d %s" % (cur_file_level, (current_dir + os.sep +
a_file))

Complete File

# -*- coding: utf-8 -*-
# Python

# find & replace strings in a dir

import os, sys, shutil, re

# if this this is not empty, then only these files will be processed
my_files = []

input_dir = "c:/Users/h3/web/xahlee_org/lojban/hrefgram2/"
input_dir = "/cygdrive/c/Users/h3/web/zz"
input_dir = "c:/Users/h3/web/xahlee_org/"

min_level = 2; # files and dirs inside input_dir are level 1.
max_level = 2; # inclusive

print_no_change = False

find_replace_list = [

(
u"""<iframe style="width:100%;border:none" src="http://xahlee.org/
footer.html"></iframe>""",
u"""<iframe style="width:100%;border:none" src="../footer.html"></
iframe>""",
),

]

def replace_string_in_file(file_path):
"Replaces all findStr by repStr in file file_path"
temp_fname = file_path + "~lc~"
backup_fname = file_path + "~bk~"

# print "reading:", file_path
input_file = open(file_path, "rb")
file_content = unicode(input_file.read(), "utf-8")
input_file.close()

num_replaced = 0
for a_pair in find_replace_list:
num_replaced += file_content.count(a_pair[0])
output_text = file_content.replace(a_pair[0], a_pair[1])
file_content = output_text

if num_replaced > 0:
print "◆ ", num_replaced, " ", file_path.replace("\\", "/")
shutil.copy2(file_path, backup_fname)
output_file = open(file_path, "r+b")
output_file.read() # we do this way instead of “os.rename” to
preserve file creation date
output_file.seek(0)
output_file.write(output_text.encode("utf-8"))
output_file.truncate()
output_file.close()
else:
if print_no_change == True:
print "no change:", file_path

# os.remove(file_path)
# os.rename(temp_fname, file_path)

def process_file(dummy, current_dir, file_list):
current_dir = os.path.normpath(current_dir)
cur_dir_level = re.sub( "^" + re.escape(input_dir), "",
current_dir).count( os.sep)
cur_file_level = cur_dir_level + 1
if min_level <= cur_file_level <= max_level:
for a_file in file_list:
if re.search(r"\.html$", a_file, re.U) and
os.path.isfile(current_dir + re.escape(os.sep) + a_file):
replace_string_in_file(current_dir + os.sep + a_file)
# print "%d %s" % (cur_file_level, (current_dir + os.sep +
a_file))

input_dir = os.path.normpath(input_dir)

if (len(my_files) != 0):
for my_file in my_files:
replace_string_in_file(os.path.normpath(my_file) )
else:
os.path.walk(input_dir, process_file, "dummy")

print "Done."

 
Reply With Quote
 
 
 
 
Martin P. Hellwig
Guest
Posts: n/a
 
      04-08-2012
On 08/04/2012 12:11, Xah Lee wrote:
<cut all>
Hi Xah,

You clearly didn't want help on this subject, as you really now how to
do it anyway. But having read your posts over the years, I'd like to
give you an observation on your persona, free of charge!

You are actually a talented writer, some may find your occasional
profanity offensive but at least it highlights your frustration.
You are undoubtedly and proven a good mathematian and more important
than that self taught. You have a natural feel for design (otherwise you
would not clash with others view of programming).
You know a mixture of programming languages.

Whether you like it or not, you are in the perfect position to create a
new programming language and design a new programming paradigm.
Unhindered from all the legacy crap, that keep people like me behind (I
actually like BNF for example).

It is likely I am wrong, but if that is your destiny there is no point
fighting it.

Cheers and good luck,

Martin
 
Reply With Quote
 
 
 
 
David Canzi
Guest
Posts: n/a
 
      04-08-2012
Xah Lee <(E-Mail Removed)> wrote:
>hi guys,
>
>sorry am feeling a bit prolifit lately.
>
>today's show, is: '**** Python'
>http://xahlee.org/comp/****_python.html
>
>------------------------------------
>**** Python
> By Xah Lee, 2012-04-08
>
>**** Python.
>
>just ****ing spend 2 hours and still going.
>
>here's the short story.
>
>so recently i switched to a Windows version of python. Now, Windows
>version takes path using win backslash, instead of cygwin slash. This
>****ing broke my find/replace scripts that takes a dir level as input.
>Because i was counting slashes.
>
>Ok no problem. My sloppiness. After all, my implementation wasn't
>portable. So, let's fix it. After a while, discovered there's the
>'os.sep'. Ok, replace "/" to 'os.sep', done. Then, bang, all hell
>went lose. Because, the backslash is used as escape in string, so any
>regex that manipulate path got ****ed majorly.


When Microsoft created MS-DOS, they decided to use '\' as
the separator in file names. This was at a time when several
previously existing interactive operating systems were using
'/' as the file name separator and at least one was using '\'
as an escape character. As a result of Microsoft's decision
to use '\' as the separator, people have had to do extra work
to adapt programs written for Windows to run in non-Windows
environments, and vice versa. People have had to do extra work
to write software that is portable between these environments.
People have done extra work while creating tools to make writing
portable software easier. And people have to do extra work when
they use these tools, because using them is still harder than
writing portable code for operating systems that all used '/'
as their separator would have been.

If you added up the cost of all the extra work that people have
done as a result of Microsoft's decision to use '\' as the file
name separator, it would probably be enough money to launch the
Burj Khalifa into geosynchronous orbit.

So, when you say **** Python, are you sure you're shooting at the
right target?

--
David Canzi | TIMTOWWTDI (tim-toe-woe-dee): There Is More Than One
| Wrong Way To Do It
 
Reply With Quote
 
Kaz Kylheku
Guest
Posts: n/a
 
      04-08-2012
["Followup-To:" header set to comp.lang.lisp.]
On 2012-04-08, David Canzi <(E-Mail Removed)> wrote:
> Xah Lee <(E-Mail Removed)> wrote:
>>hi guys,
>>
>>sorry am feeling a bit prolifit lately.
>>
>>today's show, is: '**** Python'
>>http://xahlee.org/comp/****_python.html
>>
>>------------------------------------
>>**** Python
>> By Xah Lee, 2012-04-08
>>
>>**** Python.
>>
>>just ****ing spend 2 hours and still going.
>>
>>here's the short story.
>>
>>so recently i switched to a Windows version of python. Now, Windows
>>version takes path using win backslash, instead of cygwin slash. This
>>****ing broke my find/replace scripts that takes a dir level as input.
>>Because i was counting slashes.
>>
>>Ok no problem. My sloppiness. After all, my implementation wasn't
>>portable. So, let's fix it. After a while, discovered there's the
>>'os.sep'. Ok, replace "/" to 'os.sep', done. Then, bang, all hell
>>went lose. Because, the backslash is used as escape in string, so any
>>regex that manipulate path got ****ed majorly.

>
> When Microsoft created MS-DOS, they decided to use '\' as
> the separator in file names.


This is false. The MS-DOS (dare I say it) "kernel" accepts both forward and
backslashes as separators.

The application-level choice was once configurable through a variable
in COMMAND.COM. Then they hard-coded it to backslash.

However, Microsoft operating systems continued to (and until this day)
recognize slash as a path separator.

Only, there are broken userland programs on Windows which don't know this.

> So, when you say **** Python, are you sure you're shooting at the
> right target?


I would have to say, probably yes.
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      04-08-2012
On 2012-04-08 17:03, David Canzi <(E-Mail Removed)> wrote:
> If you added up the cost of all the extra work that people have
> done as a result of Microsoft's decision to use '\' as the file
> name separator, it would probably be enough money to launch the
> Burj Khalifa into geosynchronous orbit.


So we have another contender for the Most Expensive One-byte Mistake?

Poul-Henning Kamp nominated the C/Unix guys:

http://queue.acm.org/detail.cfm?id=2010365

hp


--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) |
__/ | http://www.hjp.at/ | -- Bill Code on (E-Mail Removed)
 
Reply With Quote
 
Jrgen Exner
Guest
Posts: n/a
 
      04-08-2012
"David Canzi" <(E-Mail Removed)> wrote:
>Xah Lee <(E-Mail Removed)> wrote:


Please check whom you are replying to.

Do not feed the trolls, please.

jue
 
Reply With Quote
 
Kaz Kylheku
Guest
Posts: n/a
 
      04-08-2012
On 2012-04-08, Peter J. Holzer <(E-Mail Removed)> wrote:
> On 2012-04-08 17:03, David Canzi <(E-Mail Removed)> wrote:
>> If you added up the cost of all the extra work that people have
>> done as a result of Microsoft's decision to use '\' as the file
>> name separator, it would probably be enough money to launch the
>> Burj Khalifa into geosynchronous orbit.

>
> So we have another contender for the Most Expensive One-byte Mistake?


The one byte mistake in DOS and Windows is recognizing two characters as path
separators. All code that correctly handles paths is complicated by having to
look for a set of characters instead of just scanning for a byte.

> http://queue.acm.org/detail.cfm?id=2010365


DOS backslashes are already mentioned in that page, but alas it perpetuates the
clueless myth that DOS and windows do not recognize any other path separator.

Worse, the one byte Unix mistake being covered is, disappointingly, just a
clueless rant against null-terminated strings.

Null-terminated strings are infinitely better than the ridiculous encapsulation of length + data.

For one thing, if s is a non-empty null terminated string then, cdr(s) is also
a string representing the rest of that string without the first character,
where cdr(s) is conveniently defined as s + 1.

Not only can compilers compress storage by recognizing that string literals are
the suffixes of other string literals, but a lot of string manipulation code is
simplified, because you can treat a pointer to interior of any string as a
string.

Because they are recursively defined, you can do elegant tail recursion on null
terminated strings:

const char *rec_strchr(const char *in, int ch)
{
if (*in == 0)
return 0;
else if (*in == ch)
return in;
else
return rec_strchr(in + 1, ch);
}

length + data also raises the question: what type is the length field? One
byte? Two bytes? Four? And then you have issues of byte order. Null terminated
C strings can be written straight to a binary file or network socket and be
instantly understood on the other end.

Null terminated strings have simplified all kids of text manipulation, lexical
scanning, and data storage/communication code resulting in immeasurable
savings over the years.
 
Reply With Quote
 
Nobody
Guest
Posts: n/a
 
      04-08-2012
On Sun, 08 Apr 2012 04:11:20 -0700, Xah Lee wrote:

> Ok no problem. My sloppiness. After all, my implementation wasn't
> portable. So, let's fix it. After a while, discovered there's the
> os.sep. Ok, replace "/" to os.sep, done. Then, bang, all hell
> went lose. Because, the backslash is used as escape in string, so any
> regex that manipulate path got ****ed majorly. So, now you need to
> find a quoting mechanism.


if os.altsep is not None:
sep_re = '[%s%s]' % (os.sep, os.altsep)
else:
sep_re = '[%s]' % os.sep

But really, you should be ranting about regexps rather than Python.
They're convenient if you know exactly what you want to match, but a
nuisance if you need to generate the expression based upon data which is
only available at run-time (and re.escape() only solves one very specific
problem).

 
Reply With Quote
 
Xah Lee
Guest
Posts: n/a
 
      04-09-2012
Xah Lee wrote:

« http://xahlee.org/comp/****_python.html »

David Canzi wrote

«When Microsoft created MS-DOS, they decided to use '\' as the
separator in file names. *This was at a time when several previously
existing interactive operating systems were using '/' as the file name
separator and at least one was using '\' as an escape character. *As a
result of Microsoft's decision to use '\' as the separator, people
have had to do extra work to adapt programs written for Windows to run
in non-Windows environments, and vice versa. *People have had to do
extra work to write software that is portable between these
environments. People have done extra work while creating tools to
make writing portable software easier. *And people have to do extra
work when they use these tools, because using them is still harder
than writing portable code for operating systems that all used '/' as
their separator would have been.»

namekuseijin wrote:

> yes, absolutely. *But you got 2 inaccuracies there: *1) Microsoft didn't create DOS; 2) ****ing DOS was written in C, and guess what, it uses \ as escape character. *****ing microsoft.
>
> > So, when you say **** Python, are you sure you're shooting at the
> > right target?

>
> I agree. ***** winDOS and ****ing microsoft.


No. The choice to use backslash than slash is actually a good one.

because, slash is one of the useful char, far more so than backslash.
Users should be able to use that for file names.

i don't know the detailed history of path separator, but if i were to
blame, it's **** unix. The entirety of unix, unix geek, unixers, unix
****heads. **** unix.

〈On Unix Filename Characters Problem〉
http://xahlee.org/UnixResource_dir/w...ame_chars.html

〈On Unix File System's Case Sensitivity〉
http://xahlee.org/UnixResource_dir/_/fileCaseSens.html

〈UNIX Tar Problem: File Length Truncation, Unicode Name Support〉
http://xahlee.org/comp/unix_tar_problem.html

〈What Characters Are Not Allowed in File Names?〉
http://xahlee.org/mswin/allowed_char...ile_names.html

〈Unicode Support in File Names: Windows, Mac, Emacs, Unison, Rsync,
USB, Zip〉
http://xahlee.org/mswin/unicode_support_file_names.html

〈The Nature of the Unix Philosophy〉
http://xahlee.org/UnixResource_dir/writ/unix_phil.html

Xah
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      04-09-2012
In article <4f82d3e2$1$fuzhry+tra$(E-Mail Removed)>,
Shmuel (Seymour J.) Metz <(E-Mail Removed)> wrote:

> >Null terminated strings have simplified all kids of text
> >manipulation, lexical scanning, and data storage/communication
> >code resulting in immeasurable savings over the years.

>
> Yeah, especially code that needs to deal with lengths and nulls. It's
> great for buffer overruns too.


I once worked on a C++ project that used a string class which kept a
length count, but also allocated one extra byte and stuck a null at the
end of every string.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments