Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Split single file into multiple files based on patterns

Reply
Thread Tools

Split single file into multiple files based on patterns

 
 
satyam
Guest
Posts: n/a
 
      10-24-2012
I have a text file like this

A1980JE39300007 2732 4195 12.527000
A1980JE39300007 3465 9720 22.000000
A1980JE39300007 1853 3278 12.500000
A1980JE39300007 2732 2732 187.500000
A1980JE39300007 19 4688 3.619000
A1980JE39300007 2995 9720 6.667000
A1980JE39300007 1603 9720 30.000000
A1980JE39300007 234 4195 42.416000
A1980JE39300007 2732 9720 18.000000
A1980KK18700010 130 303 4.985000
A1980KK18700010 7 4915 0.435000
A1980KK18700010 25 1620 1.722000
A1980KK18700010 25 186 0.654000
A1980KK18700010 50 130 3.199000
A1980KK18700010 186 3366 4.780000
A1980KK18700010 30 186 1.285000
A1980KK18700010 30 185 4.395000
A1980KK18700010 185 186 9.000000
A1980KK18700010 25 30 3.493000

I want to split the file and get multiple files like A1980JE39300007.txt and A1980KK18700010.txt, where each file will contain column2, 3 and 4.
Thanks
Satyam
 
Reply With Quote
 
 
 
 
Jason Friedman
Guest
Posts: n/a
 
      10-24-2012
On Tue, Oct 23, 2012 at 9:01 PM, satyam <(E-Mail Removed)> wrote:
> I have a text file like this
>
> A1980JE39300007 2732 4195 12.527000
> A1980JE39300007 3465 9720 22.000000
> A1980JE39300007 1853 3278 12.500000
> A1980JE39300007 2732 2732 187.500000
> A1980JE39300007 19 4688 3.619000
> A1980KK18700010 30 186 1.285000
> A1980KK18700010 30 185 4.395000
> A1980KK18700010 185 186 9.000000
> A1980KK18700010 25 30 3.493000
>
> I want to split the file and get multiple files like A1980JE39300007.txt and A1980KK18700010.txt, where each file will contain column2, 3 and 4.


Unless your source file is very large this should be sufficient:

$ cat source
A1980JE39300007 2732 4195 12.527000
A1980JE39300007 3465 9720 22.000000
A1980JE39300007 1853 3278 12.500000
A1980JE39300007 2732 2732 187.500000
A1980JE39300007 19 4688 3.619000
A1980JE39300007 2995 9720 6.667000
A1980JE39300007 1603 9720 30.000000
A1980JE39300007 234 4195 42.416000
A1980JE39300007 2732 9720 18.000000
A1980KK18700010 130 303 4.985000
A1980KK18700010 7 4915 0.435000
A1980KK18700010 25 1620 1.722000
A1980KK18700010 25 186 0.654000
A1980KK18700010 50 130 3.199000
A1980KK18700010 186 3366 4.780000
A1980KK18700010 30 186 1.285000
A1980KK18700010 30 185 4.395000
A1980KK18700010 185 186 9.000000
A1980KK18700010 25 30 3.493000

$ python3
Python 3.2.3 (default, Sep 10 2012, 18:14:40)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> for line in open("source"):

.... file_name, remainder = line.strip().split(None, 1)
.... with open(file_name + ".txt", "a") as writer:
.... print(remainder, file=writer)
....
>>>


$ ls *txt
A1980JE39300007.txt A1980KK18700010.txt

$ cat A1980JE39300007.txt
2732 4195 12.527000
3465 9720 22.000000
1853 3278 12.500000
2732 2732 187.500000
19 4688 3.619000
2995 9720 6.667000
1603 9720 30.000000
234 4195 42.416000
2732 9720 18.000000
 
Reply With Quote
 
 
 
 
David Hutto
Guest
Posts: n/a
 
      10-24-2012
On Tue, Oct 23, 2012 at 11:01 PM, satyam <(E-Mail Removed)> wrote:
> I have a text file like this
>
> A1980JE39300007 2732 4195 12.527000
> A1980JE39300007 3465 9720 22.000000
> A1980JE39300007 1853 3278 12.500000
> A1980JE39300007 2732 2732 187.500000
> A1980JE39300007 19 4688 3.619000
> A1980JE39300007 2995 9720 6.667000
> A1980JE39300007 1603 9720 30.000000
> A1980JE39300007 234 4195 42.416000
> A1980JE39300007 2732 9720 18.000000
> A1980KK18700010 130 303 4.985000
> A1980KK18700010 7 4915 0.435000
> A1980KK18700010 25 1620 1.722000
> A1980KK18700010 25 186 0.654000
> A1980KK18700010 50 130 3.199000
> A1980KK18700010 186 3366 4.780000
> A1980KK18700010 30 186 1.285000
> A1980KK18700010 30 185 4.395000
> A1980KK18700010 185 186 9.000000
> A1980KK18700010 25 30 3.493000
>
> I want to split the file and get multiple files like A1980JE39300007.txt and A1980KK18700010.txt, where each file will contain column2, 3 and 4.
> Thanks
> Satyam





#parse through the lines
turn_text_to_txt = ['A1980JE39300007 2732 4195 12.527000',
'A1980JE39300007 3465 9720 22.000000',
'A1980JE39300007 1853 3278 12.500000',
'A1980JE39300007 2732 2732 187.500000',
'A1980JE39300007 19 4688 3.619000',
'A1980KK18700010 30 186 1.285000',
'A1980KK18700010 30 185 4.395000',
'A1980KK18700010 185 186 9.000000',
'A1980KK18700010 25 30 3.493000']
#then split and open a file for writing to create the file

#then start a count to add an extra number, because the files #you're
opening have the same name in some, which will #cause python to
overwrite the last file with that name.

#So I added an extra integer count after an underscore to #keep all
files, even if the have the first base number.

count = 0

for file_data in turn_text_to_txt:

#open the file for writing in 'w' mode so it creates the file, and
#adds in the appropriate data, including the extra count i#nteger just
in case there are files with the same name.

f = open('/home/david/files/%s_%s.txt' % (file_data.split(' ')[0], count), 'w')

#write the data to the file, however this is in list format, I could
go further, but need a little time for a few other things.

f.write( str(file_data.split(' ')[1:]))

#close the file
f.close()

#increment the count for the next iteration, if necessary, and #again,
this is just in case the files have the same name, and #need an
additive.
# count += 1


Full code from above, without comments:

turn_text_to_txt = ['A1980JE39300007 2732 4195 12.527000',
'A1980JE39300007 3465 9720 22.000000',
'A1980JE39300007 1853 3278 12.500000',
'A1980JE39300007 2732 2732 187.500000',
'A1980JE39300007 19 4688 3.619000',
'A1980KK18700010 30 186 1.285000',
'A1980KK18700010 30 185 4.395000',
'A1980KK18700010 185 186 9.000000',
'A1980KK18700010 25 30 3.493000']
#then split and open a file for writing to create the file
count = 0

for file_data in turn_text_to_txt:

print '/home/david/files/%s.txt' % (file_data.split(' ')[0])

f = open('/home/david/files/%s_%s.txt' % (file_data.split(' ')[0], count), 'w')

f.write( str(file_data.split(' ')[1:]))

f.close()

count += 1




--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com
 
Reply With Quote
 
Demian Brecht
Guest
Posts: n/a
 
      10-24-2012
On 2012-10-23, at 10:24 PM, David Hutto <(E-Mail Removed)> wrote:

> count = 0

Don't use count.

> for file_data in turn_text_to_txt:

Use enumerate:

for count, file_data in enumerate(turn_text_to_txt):

> f = open('/home/david/files/%s_%s.txt' % (file_data.split(' ')[0], count), 'w')

Use with:

with open('file path', 'w') as f:
f.write('data')

Not only is it shorter, but it automatically closes the file once you've come out of the inner block, whether successfully or erroneously.


Demian Brecht
@demianbrecht
http://demianbrecht.github.com




 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      10-24-2012
On Tue, 23 Oct 2012 21:43:21 -0600, Jason Friedman <(E-Mail Removed)>
declaimed the following in gmane.comp.python.general:

> $ python3
> Python 3.2.3 (default, Sep 10 2012, 18:14:40)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> for line in open("source"):

> ... file_name, remainder = line.strip().split(None, 1)
> ... with open(file_name + ".txt", "a") as writer:
> ... print(remainder, file=writer)


That's a lot of OS file open/closing operations...

I'd be more likely to configure the code as a "standard" "report
control break".

control = None
fin = open("source")
for line in fin:
newControl, data = line.split(None, 1) #leave new-line for output
if control != newControl: #only open/close files on
#change of control break
if control:
fout.close()
fout = open(newControl + ".txt", "a")
#I'd prefer using "w" IF the input is already sorted
#that way one knows a new file is created on each run
#instead of having to delete any existing files from
#previous runs
control = newControl
fout.write(data)
if control:
fout.close()
fin.close()

--
Wulfraed Dennis Lee Bieber AF6VN
http://www.velocityreviews.com/forums/(E-Mail Removed) HTTP://wlfraed.home.netcom.com/

 
Reply With Quote
 
Alain Ketterlin
Guest
Posts: n/a
 
      10-24-2012
satyam <(E-Mail Removed)> writes:

> I have a text file like this
>
> A1980JE39300007 2732 4195 12.527000
> A1980JE39300007 3465 9720 22.000000
> A1980JE39300007 2732 9720 18.000000
> A1980KK18700010 130 303 4.985000
> A1980KK18700010 7 4915 0.435000

[...]
> I want to split the file and get multiple files like
> A1980JE39300007.txt and A1980KK18700010.txt, where each file will
> contain column2, 3 and 4.


Sorry for being completely off-topic here, but awk has a very convenient
feature to deal with this. Simply use:

awk '{ print $2,$3,$4 > $1".txt"; }' /path/to/your/file

-- Alain.
 
Reply With Quote
 
Mark Lawrence
Guest
Posts: n/a
 
      10-24-2012
On 24/10/2012 06:46, Alain Ketterlin wrote:
> satyam <(E-Mail Removed)> writes:
>
>> I have a text file like this
>>
>> A1980JE39300007 2732 4195 12.527000
>> A1980JE39300007 3465 9720 22.000000
>> A1980JE39300007 2732 9720 18.000000
>> A1980KK18700010 130 303 4.985000
>> A1980KK18700010 7 4915 0.435000

> [...]
>> I want to split the file and get multiple files like
>> A1980JE39300007.txt and A1980KK18700010.txt, where each file will
>> contain column2, 3 and 4.

>
> Sorry for being completely off-topic here, but awk has a very convenient
> feature to deal with this. Simply use:
>
> awk '{ print $2,$3,$4 > $1".txt"; }' /path/to/your/file
>
> -- Alain.
>


Although practicality beats purity

--
Cheers.

Mark Lawrence.

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      10-24-2012
On Tue, 23 Oct 2012 20:01:03 -0700, satyam wrote:

> I have a text file like this
>
> A1980JE39300007 2732 4195 12.527000

[...]

> I want to split the file and get multiple files like A1980JE39300007.txt
> and A1980KK18700010.txt, where each file will contain column2, 3 and 4.


Are you just excited and want to tell everyone, or do you actually have a
question?

Have you tried to write some code, or do you just expect others to do
your work for you?

If so, I see that your expectation was correct.



--
Steven
 
Reply With Quote
 
David Hutto
Guest
Posts: n/a
 
      10-24-2012
On Wed, Oct 24, 2012 at 3:52 AM, Steven D'Aprano
<(E-Mail Removed)> wrote:
> On Tue, 23 Oct 2012 20:01:03 -0700, satyam wrote:
>
>> I have a text file like this
>>
>> A1980JE39300007 2732 4195 12.527000

> [...]
>
>> I want to split the file and get multiple files like A1980JE39300007.txt
>> and A1980KK18700010.txt, where each file will contain column2, 3 and 4.

>
> Are you just excited and want to tell everyone, or do you actually have a
> question?
>
> Have you tried to write some code, or do you just expect others to do
> your work for you?
>
> If so, I see that your expectation was correct.
>
>
>
> --
> Steven


Some learn better with a full example, better than any small challenge
that can be thrown in at certain times.

I think it should be a little of both, especially if you (an
algorithmitist for the OP)only have enough time to throw out untested
pseudo code.

--
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      10-24-2012
satyam wrote:

> I have a text file like this
>
> A1980JE39300007 2732 4195 12.527000
> A1980JE39300007 3465 9720 22.000000
> A1980JE39300007 1853 3278 12.500000
> A1980JE39300007 2732 2732 187.500000
> A1980JE39300007 19 4688 3.619000
> A1980JE39300007 2995 9720 6.667000
> A1980JE39300007 1603 9720 30.000000
> A1980JE39300007 234 4195 42.416000
> A1980JE39300007 2732 9720 18.000000
> A1980KK18700010 130 303 4.985000
> A1980KK18700010 7 4915 0.435000
> A1980KK18700010 25 1620 1.722000
> A1980KK18700010 25 186 0.654000
> A1980KK18700010 50 130 3.199000
> A1980KK18700010 186 3366 4.780000
> A1980KK18700010 30 186 1.285000
> A1980KK18700010 30 185 4.395000
> A1980KK18700010 185 186 9.000000
> A1980KK18700010 25 30 3.493000
>
> I want to split the file and get multiple files like A1980JE39300007.txt
> and A1980KK18700010.txt, where each file will contain column2, 3 and 4.
> Thanks Satyam


import os
from itertools import groupby
from operator import itemgetter

get_key = itemgetter(0)
get_value = itemgetter(1)

output_folder = "tmp"
with open("infile.txt") as instream:
pairs = (line.split(None, 1) for line in instream)
for key, group in groupby(pairs, key=get_key):
path = os.path.join(output_folder, key + ".txt")
with open(path, "a") as outstream:
outstream.writelines(get_value(line) for line in group)

If you are running the code more than once make sure that you remove the
files from the previous run first.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
resolve single line with multiple items into mutliple lines, single items ela Perl Misc 12 04-06-2009 06:47 PM
replace multiple patterns of a string in single pass Vellingiri Arul Ruby 1 09-19-2007 01:48 PM
Text files read multiple files into single file, and then recreate the multiple files googlinggoogler@hotmail.com Python 4 02-13-2005 05:44 PM
where to find good patterns and sources of patterns (was Re: singletons) crichmon C++ 4 07-07-2004 10:02 PM
How can I encode multiple files into a single file? Using UUENCODE source in C Anonieko Ramos Computer Security 0 05-08-2004 02:16 PM



Advertisments