Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Replace and inserting strings within .txt files with the use of regex

Reply
Thread Tools

Replace and inserting strings within .txt files with the use of regex

 
 
Νίκος
Guest
Posts: n/a
 
      08-08-2010
Hello dear Pythoneers,

I have over 500 .php web pages in various subfolders under 'data'
folder that i have to rename to .html and and ditch the '<?' and '?>'
tages from within and also insert a very first line of <!-- id -->
where id must be an identification unique number of every page for
counter tracking purposes. ONly pure html code must be left.

I before find otu Python used php and now iam switching to templates +
python solution so i ahve to change each and every page.

I don't know how to handle such a big data replacing problem and
cannot play with fire because those 500 pages are my cleints pages and
data of those filesjust cannot be messes up.

Can you provide to me a script please that is able of performing an
automatic way of such a page content replacing?

Thanks a million!
 
Reply With Quote
 
 
 
 
rantingrick
Guest
Posts: n/a
 
      08-08-2010
On Aug 7, 7:20*pm, Νίκος <(E-Mail Removed)> wrote:
> Hello dear Pythoneers,


I prefer Pythonista, but anywho..

> I have over 500 .php web pages in various subfolders under 'data'
> folder that i have to rename to .html


import os
os.rename(old, new)

> and and ditch the '<?' and '?>' tages from within


path = 'some/valid/path'
f = open(path, 'r')
data = f.read()
f.close()
data.replace('<?', '')
data.replace('?>', '')

> and also insert a very first line of <!-- id -->
> where id must be an identification unique number of every page for
> counter tracking purposes.


comment = "<!-- %s -->"%(idnum)
data.insert(idx, comment)

> ONly pure html code must be left.


Well then don't F up! However judging from the amount of typos in this
post i would suggest you do some major testing!

> I don't know how to handle such a big data replacing problem and
> cannot play with fire because those 500 pages are my cleints pages and
> data of those files just cannot be messes up.


Better do some serous testing first, or (if you have enough disc
space ) create copies instead!

> Can you provide to me a script please that is able of performing an
> automatic way of such a page content replacing?


This is very basic stuff and the fine manual is free you know. But how
much are you willing to pay?
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      08-08-2010
rantingrick wrote:
> On Aug 7, 7:20 pm, Νίκος <(E-Mail Removed)> wrote:
>> Hello dear Pythoneers,

>
> I prefer Pythonista, but anywho..
>
>> I have over 500 .php web pages in various subfolders under 'data'
>> folder that i have to rename to .html

>
> import os
> os.rename(old, new)
>
>> and and ditch the '<?' and '?>' tages from within

>
> path = 'some/valid/path'
> f = open(path, 'r')
> data = f.read()
> f.close()
> data.replace('<?', '')
> data.replace('?>', '')
>

That should be:

data = data.replace('<?', '')
data = data.replace('?>', '')

>> and also insert a very first line of <!-- id -->
>> where id must be an identification unique number of every page for
>> counter tracking purposes.

>
> comment = "<!-- %s -->"%(idnum)
> data.insert(idx, comment)
>

Strings don't have an 'insert' method!

>> ONly pure html code must be left.

>
> Well then don't F up! However judging from the amount of typos in this
> post i would suggest you do some major testing!
>
>> I don't know how to handle such a big data replacing problem and
>> cannot play with fire because those 500 pages are my cleints pages and
>> data of those files just cannot be messes up.

>
> Better do some serous testing first, or (if you have enough disc
> space ) create copies instead!
>
>> Can you provide to me a script please that is able of performing an
>> automatic way of such a page content replacing?

>
> This is very basic stuff and the fine manual is free you know. But how
> much are you willing to pay?


 
Reply With Quote
 
Νίκος
Guest
Posts: n/a
 
      08-08-2010
# rename ALL php files to html in every subfolder of the folder 'data'
os.rename('*.php', '*.html') # how to tell python to
rename ALL php files to html to ALL subfolder under 'data' ?

# current path of the file to be processed
path = './data' # this must be somehow in a loop i feel
that read every file of every subfolder

# open an html file for reading
f = open(path, 'rw')
# read the contents of the whole file
data = f.read()

# replace all php tags with empty string
data = data.replace('<?', '')
data = data.replace('?>', '')

# write replaced data to file
data = f.write()

# insert an increasing unique integer number at the very first line
of every html file processing
comment = "<!-- %s -->"%(idnum) # how will the number
change here an increased by one file after file?
f = f.close()

Please help i'm new to python an apart from syntx its a logic problem
as well and needs experience.
 
Reply With Quote
 
John S
Guest
Posts: n/a
 
      08-08-2010
On Aug 7, 8:20*pm, Νίκος <(E-Mail Removed)> wrote:
> Hello dear Pythoneers,
>
> I have over 500 .php web pages in various subfolders under 'data'
> folder that i have to rename to .html and and ditch the '<?' and '?>'
> tages from within and also insert a very first line of <!-- id -->
> where id must be an identification unique number of every page for
> counter tracking purposes. ONly pure html code must be left.
>
> I before find otu Python used php and now iam switching to templates +
> python solution so i ahve to change each and every page.
>
> I don't know how to handle such a big data replacing problem and
> cannot play with fire because those 500 pages are my cleints pages and
> data of those filesjust cannot be messes up.
>
> Can you provide to me a script please that is able of performing an
> automatic way of such a page content replacing?
>
> Thanks a million!


If the 500 web pages are PHP only in the sense that there is only one
pair of <? ?> tags in each file, surrounding the entire content, then
what you ask for is doable.

from os.path import join
import os

id = 1 # id number
for currdir,files,dirs in os.walk('data'):
for f in files:
if f.endswith('php'):
source_file_name = join(currdir,f) # get abs path to
filename
source_file = open(source_file_name)
source_contents = source_file.read() # read contents of
PHP file
source_file.close()

# replace tags
source_contents = source_contents.replace('<%','')
source_contents = source_contents.replace('%>','')

# add ID
source_contents = ( '<!-- %d -->' % id ) + source_contents
id += 1

# create new file with .html extension
source_file_name =
source_file_name.replace('.php','.html')
dest_file = open(source_file_name,'w')
dest_file.write(source_contents) # write contents
dest_file.close()

Note: error checking left out for clarity.

On the other hand, if your 500 web pages contain embedded PHP
variables or logic, you have a big job ahead. Django templates and PHP
are two different languages for embedding data and logic in web pages.
Converting a project from PHP to Django involves more than renaming
the template files and deleting "<?" and friends.

For example, here is a snippet of PHP which checks which browser is
viewing the page:

<?php
if (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE') !== FALSE) {
echo 'You are using Internet Explorer.<br />';
}
?>

In Django, you would typically put this logic in a Django *view*
(which btw is not what is called a 'view' in MVC term), which is the
code that prepares data for the template. The logic would not live
with the HTML. The template uses "template variables" that the view
has associated with a Python variable or function. You might create a
template variable (created via a Context object) named 'browser' that
contains a value that identifies the browser.

Thus, your Python template (HTML file) might look like this:

{% if browser == 'IE' %}You are using Internet Explorer{% endif %}

PHP tends to combine the presentation with the business logic, or in
MVC terms, combines the view with the controller. Django separates
them out, which many people find to be a better way. The person who
writes the HTML doesn't have to speak Python, but only know the names
of template variables and a little bit of template logic. In PHP, the
HTML code and all the business logic lives in the same files. Even
here, it would probably make sense to calculate the browser ID in the
header of the HTML file, then access it via a variable in the body.

If you have 500 static web pages that are part of the same
application, but that do not contain any logic, your application might
need to be redesigned.

Also, you are doing your changes on a COPY of the application on a non-
public server, aren't you? If not, then you really are playing with
fire.


HTH,
John
 
Reply With Quote
 
rantingrick
Guest
Posts: n/a
 
      08-08-2010
On Aug 7, 8:42*pm, MRAB <(E-Mail Removed)> wrote:

> That should be:
>
> * *data = data.replace('<?', '')
> * *data = data.replace('?>', '')


Yes, Thanks MRAB. I did forget that important detail.

> Strings don't have an 'insert' method!


*facepalm*! I really must stop Usenet-ing whilst consuming large
volumes of alcoholic beverages.
 
Reply With Quote
 
John S
Guest
Posts: n/a
 
      08-08-2010
Even though I just replied above, in reading over the OP's message, I
think the OP might be asking:

"How can I use RE string replacement to find PHP tags and convert them
to Django template tags?"

Instead of saying

source_contents = source_contents.replace(...)

say this instead:

import re


def replace_php_tags(m):
''' PHP tag replacer
This function is called for each PHP tag. It gets a Match object as
its parameter, so you can get the contents of the old tag, and
should
return the new (Django) tag.
'''

# m is the match object from the current match
php_guts = m.group(1) # the contents of the PHP tag

# now put the replacement logic here

# and return whatever should go in place of the PHP tag,
# which could be '{{ python_template_var }}'
# or '{% template logic ... %}
# or some combination

source_contents = re.sub('<?\s*(.*?)\s*?
>',replace_php_tags,source_contents)







 
Reply With Quote
 
Νίκος
Guest
Posts: n/a
 
      08-08-2010
On 8 Αύγ, 05:42, John S <(E-Mail Removed)> wrote:
> If the 500 web pages are PHP only in the sense that there is only one
> pair of <? ?> tags in each file, surrounding the entire content, then
> what you ask for is doable.


First of all, thank you very much John for your BIG effort to help
me(i'm still readign your posts)!

I have to tell you here that those php files contain several instances
of php opening and closing tags(like 3 each php file). The rest is
pure html data. That happened because those files were in the
beginning html only files that later needed conversion to php due to
some dynamic code that had to be used to address some issues.

Please tell me that the code you provided can be adjusted to several
instances as well!
 
Reply With Quote
 
Νίκος
Guest
Posts: n/a
 
      08-08-2010
On 8 Αύγ, 05:56, John S <(E-Mail Removed)> wrote:
>"How can I use RE string replacement to find PHP tags and convert them
>to Django template tags?"


No, not at all John, at least not yet!

I have only 1 week that i'm learnign python(changing from php & perl)
so i'm very fresh at this beautifull and straighforwrd language.

When i have a good understnading of Python then i will proceed to
Django templates.
Until then my Python templates would be only 'simple html files' that
the only thign they contain apart form the html data would be the
special string formatting identifies '%s'
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      08-08-2010
On Sat, 07 Aug 2010 17:20:24 -0700, Νίκος wrote:

> I don't know how to handle such a big data replacing problem and cannot
> play with fire because those 500 pages are my cleints pages and data of
> those filesjust cannot be messes up.


Take a backup copy of the files, and only edit the copies. Don't replace
the originals until you know they're correct.

--
Steven
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to replace all strings matching a pattern with correspondinglower case strings ? anonym Java 1 01-15-2009 07:29 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM
regex bug (comments within regex not as robust) kg.google@olympiakos.com Perl Misc 3 10-27-2005 07:21 PM
Comparing strings from within strings Rick C Programming 3 10-21-2003 09:10 AM



Advertisments