Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > program to copy files - problems - unix ksh to java

Reply
Thread Tools

program to copy files - problems - unix ksh to java

 
 
kaeli
Guest
Posts: n/a
 
      02-09-2004

Hi all,

I've got a shell script (ksh) that does some file copying and deleting.
I'm running into some problems with it that I'm wondering if I can
solve. Since I plan on re-writing it with java (1.4), I figure I might
as well do it right this time.

Here's the drill:
Cron runs this code every 5 minutes.
Program looks on one machine, uses ssh to copy a file to another
machine, changes the filename, the owner and permissions (chmod), and
then deletes the file from the source machine.
Sounds simple enough...

Problems:
Large files (we're talking gigabytes) take more than 5 minutes to copy.
Something is causing the file to be deleted before it has finished
copying. We lose the whole file, as it doesn't show up on either
machine.
Program gets called again while an instance is running, so it tries to
copy files that are currently being copied.

I was going to solve this with the usual .running type fix, but we
really need the program to actually run every 5 minutes (more than one
instance will be needed). If an instance is already copying the file,
the file should just be ignored. The file should not be deleted if the
copy hasn't finished.

Does anyone know of any system stuff I should be looking at for java?
Specifically, Unix Solaris interface so I can tell if a file is in use?
Also, how can I make sure that the copy was finished before deleting the
source? I expected the script to wait for the copy to finish before
deleting, but it appears that it is not doing that. Should I use threads
for this?

Thanks for any ideas, input, etc...

--
--
~kaeli~
Bakers trade bread recipes on a knead-to-know basis.
http://www.ipwebdesign.net/wildAtHeart
http://www.ipwebdesign.net/kaelisSpace

 
Reply With Quote
 
 
 
 
hiwa
Guest
Posts: n/a
 
      02-10-2004
kaeli <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>. ..
> Hi all,
>
> I've got a shell script (ksh) that does some file copying and deleting.
> I'm running into some problems with it that I'm wondering if I can
> solve. Since I plan on re-writing it with java (1.4), I figure I might
> as well do it right this time.
>
> Here's the drill:
> Cron runs this code every 5 minutes.
> Program looks on one machine, uses ssh to copy a file to another
> machine, changes the filename, the owner and permissions (chmod), and
> then deletes the file from the source machine.
> Sounds simple enough...
>
> Problems:
> Large files (we're talking gigabytes) take more than 5 minutes to copy.
> Something is causing the file to be deleted before it has finished
> copying. We lose the whole file, as it doesn't show up on either
> machine.
> Program gets called again while an instance is running, so it tries to
> copy files that are currently being copied.
>
> I was going to solve this with the usual .running type fix, but we
> really need the program to actually run every 5 minutes (more than one
> instance will be needed). If an instance is already copying the file,
> the file should just be ignored. The file should not be deleted if the
> copy hasn't finished.
>
> Does anyone know of any system stuff I should be looking at for java?
> Specifically, Unix Solaris interface so I can tell if a file is in use?
> Also, how can I make sure that the copy was finished before deleting the
> source? I expected the script to wait for the copy to finish before
> deleting, but it appears that it is not doing that. Should I use threads
> for this?
>
> Thanks for any ideas, input, etc...
>
> --

Are you syncing or flushing with proper synchronization?
 
Reply With Quote
 
 
 
 
Harald Kirsch
Guest
Posts: n/a
 
      02-10-2004
kaeli <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>. ..
> Cron runs this code every 5 minutes.
> Program looks on one machine, uses ssh to copy a file to another
> machine, changes the filename, the owner and permissions (chmod), and
> then deletes the file from the source machine.
> Sounds simple enough...
>
> Problems:
> Large files (we're talking gigabytes) take more than 5 minutes to copy.
> Something is causing the file to be deleted before it has finished
> copying. We lose the whole file, as it doesn't show up on either
> machine.


Hard to believe on a *NIX machine as the system allows to delete
files held open by other processes. It may be a possible problem
if the file is on NFS. In any case it
sounds like the first process, when finished, deletes the file,
while the 2nd copy process, started while the first was still
running, then gets in trouble and messes things up.

> Program gets called again while an instance is running, so it tries to
> copy files that are currently being copied.


A solution might be to rename the file locally *before*
copying. This way the process starting 5 minutes later will
not pick up the same file again. If this is not an option,
create an empty file with another extension than the
big file as a mark that the file is being worked on.

> Also, how can I make sure that the copy was finished before deleting the
> source? I expected the script to wait for the copy to finish before
> deleting, but it appears that it is not doing that.


Assuming that you do *not* start the copy process in
the background (&), the script does wait. You have to
look for a different reason why the file is deleted
too early, maybe as I suggested above.

There is no need to solve this task in Java.

Harald.
 
Reply With Quote
 
kaeli
Guest
Posts: n/a
 
      02-10-2004
In article <(E-Mail Removed) >, HGA03630
@nifty.ne.jp enlightened us with...
>> >

> > --

> Are you syncing or flushing with proper synchronization?
>


It's currently shell (ksh). So, no. It's basically

cp file1 file2
rm file1


--
--
~kaeli~
Never say, "Oops!"; always say, "Ah, interesting!"
http://www.ipwebdesign.net/wildAtHeart
http://www.ipwebdesign.net/kaelisSpace

 
Reply With Quote
 
kaeli
Guest
Posts: n/a
 
      02-10-2004
In article <(E-Mail Removed) >,
http://www.velocityreviews.com/forums/(E-Mail Removed) enlightened us with...
>
> Hard to believe on a *NIX machine as the system allows to delete
> files held open by other processes. It may be a possible problem
> if the file is on NFS. In any case it
> sounds like the first process, when finished, deletes the file,
> while the 2nd copy process, started while the first was still
> running, then gets in trouble and messes things up.
>


That's probably it.

> > Program gets called again while an instance is running, so it tries to
> > copy files that are currently being copied.

>
> A solution might be to rename the file locally *before*
> copying. This way the process starting 5 minutes later will
> not pick up the same file again. If this is not an option,
> create an empty file with another extension than the
> big file as a mark that the file is being worked on.
>


That won't help.
The code copies any files in a directory on one machine to a directory
on another. So it will still grab the file. The code would have to move
the file, which is already the problem.

>
> There is no need to solve this task in Java.


I need threading (I think) because right now, the solution is to not run
two instances of the code at the same time. That is not a good solution.
We need code that runs almost continuously, looking in directories and
copying and deleting the files.
(it's a DMZ, in case that helps you see why this needs to be done - it
takes files people uploaded and moves them to a machine inside our
firewall)

So, as far as I see, I need C or Java, and I've not coded C in over a
year.

We want a process that runs pretty much all the time. I'm thinking a
program that looks in directories over and over. When it finds a file,
it starts a thread that copies it then deletes it. As part of the
thread, it can put the name of the file in a vector. Any new threads
check that vector before bothering a file...

I dunno, am I way off on that?

--
--
~kaeli~
Never say, "Oops!"; always say, "Ah, interesting!"
http://www.ipwebdesign.net/wildAtHeart
http://www.ipwebdesign.net/kaelisSpace

 
Reply With Quote
 
Thomas Weidenfeller
Guest
Posts: n/a
 
      02-11-2004
kaeli wrote:
> I've got a shell script (ksh) that does some file copying and deleting.
> I'm running into some problems with it that I'm wondering if I can
> solve. Since I plan on re-writing it with java (1.4), I figure I might
> as well do it right this time.


There are several ways to fix this (to a "good enough" level), without
using Java. One example:

> Here's the drill:
> Cron runs this code every 5 minutes.
> Program looks on one machine,


Check for the particular file. If found, rename the file to something
temporary - all in one operation. E.g. (Bourne-Shell syntax):

if [ mv "$file" "$file.$$" ] ; then
# Found file, renamed it.
# can start copying
# A second invocation will not find
# this file any more, and leave it alone.
> uses ssh to copy a file to another

fi

> machine, changes the filename, the owner and permissions (chmod), and
> then deletes the file from the source machine.


Delete the renamed file instead.

> Sounds simple enough...


It is. You might want to add a sanity check which e.g. runs once a day
and checks if there are old renamed files lying around and either
collect them, or delete them.

Other solutions include setting empty files as markers to indicate if a
file is already copied. But this can result in a race condition if you
start the script multiple times at the same time:

if [ ! -f "$file.mark" ] ; then
# race condition can happen here

# place a mark
touch "$file.mark"
# now copy

# after copy, delete
rm "$file" "$file.mark"
fi

Instead of setting the marker on the remote machine, you could also set
the marker on the local machine, but you would have to add the remote
host name in order to distinguish the markers.

Another solution would be to separate the script into two scripts. One
doing the copying, and another one checking if there is already a
copying script running for a particular remote machine. Have fun with ps
or pgrep.

> Problems:
> Large files (we're talking gigabytes) take more than 5 minutes to copy.
> Something is causing the file to be deleted before it has finished
> copying.


There is something else wrong. Try to find this "something" first. Most
likely it is the application writing the file, or there is something
wrong in your script. Unix is robust when it comes to the deletion of
files which are currently in use. A deletion during a copy should not
affect the copy.

> I was going to solve this with the usual .running type fix, but we
> really need the program to actually run every 5 minutes (more than one
> instance will be needed). If an instance is already copying the file,
> the file should just be ignored. The file should not be deleted if the
> copy hasn't finished.


You do check the exit code of the copy command, don't you?

> Does anyone know of any system stuff I should be looking at for java?


There is absolutely no need for Java. In fact, you will find that you
gain nothing by using Java, but that you will e.g. get problems in
setting the file owner and mode. You would have to invoke the Unix
commands from Java via exec(), or the system calls via JNI.

> Specifically, Unix Solaris interface so I can tell if a file is in use?


Java has no public platform interface, not even on Sun. You would have
to use exec() or JNI.

> Also, how can I make sure that the copy was finished before deleting the
> source?


Check the return code of the copy command.

May I suggest a good book for learning Unix scripting and a lot of other
Unix command-line tricks? "Unix Power Tools" by Peek, O'Reilly, and
Loukides.

> I expected the script to wait for the copy to finish before
> deleting, but it appears that it is not doing that. Should I use threads
> for this?


You already have concurrency problems, and you want to use threads to
move your concurrency problems to another level? I would not do this.

/Thomas

 
Reply With Quote
 
Harald Kirsch
Guest
Posts: n/a
 
      02-12-2004
kaeli <(E-Mail Removed)>:
> In article <(E-Mail Removed) >,
> (E-Mail Removed) enlightened us with...
>
> > > Program gets called again while an instance is running, so it tries to
> > > copy files that are currently being copied.

> >
> > A solution might be to rename the file locally *before*
> > copying. This way the process starting 5 minutes later will
> > not pick up the same file again. If this is not an option,
> > create an empty file with another extension than the
> > big file as a mark that the file is being worked on.
> >

>
> That won't help.
> The code copies any files in a directory on one machine to a directory
> on another. So it will still grab the file. The code would have to move
> the file, which is already the problem.


I am still not convinced that renaming would not work. Isn't there
a directory on the source machine which does not have to be copied.
You 'mv' (rename) the files to be copied to this directory and
then copy them to their destination from there in the background.

> We want a process that runs pretty much all the time. I'm thinking a
> program that looks in directories over and over. When it finds a file,
> it starts a thread that copies it then deletes it. As part of the
> thread, it can put the name of the file in a vector.


Don't forget to delete the file name from the vector, once it is
done. And a Set would actually be more appropriate than a Vector.
And if you go for Java 1.5, you'll find BlockingQueue which is
what you really want.

Harald.
 
Reply With Quote
 
kaeli
Guest
Posts: n/a
 
      02-12-2004
In article <(E-Mail Removed) >,
(E-Mail Removed) enlightened us with...
>
> I am still not convinced that renaming would not work. Isn't there
> a directory on the source machine which does not have to be copied.
> You 'mv' (rename) the files to be copied to this directory and
> then copy them to their destination from there in the background.
>


Same issue. What if in the middle of the move to the other directory,
the cron calls the code again. It still sees the file in DIR_A, even
though it's currently being copied to DIR_B. It starts to move it, but
in the middle of that move, the first invocation finishes it's move,
deleting the file from DIR_A. The first invocation may then copy to the
other machine, I suppose, but what happens when the second invocation
tries to move a file that no longer exists? Or even worse, overwrites
the destination on the new machine with an empty or half-empty file?

Currently, this problem is being handled with lockfiles. We don't like
that way if we can find another.

--
--
~kaeli~
The secret of the universe is @*&^^^ NO CARRIER
http://www.ipwebdesign.net/wildAtHeart
http://www.ipwebdesign.net/kaelisSpace

 
Reply With Quote
 
Harald Kirsch
Guest
Posts: n/a
 
      02-13-2004
kaeli <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>. ..
> In article <(E-Mail Removed) >,
> (E-Mail Removed) enlightened us with...
> >
> > I am still not convinced that renaming would not work. Isn't there
> > a directory on the source machine which does not have to be copied.
> > You 'mv' (rename) the files to be copied to this directory and
> > then copy them to their destination from there in the background.
> >

>
> Same issue. What if in the middle of the move to the other directory,
> the cron calls the code again. It still sees the file in DIR_A, even


If the two directories are on the same file system, moving always
takes the same time, independent of file size. It may take a few
milliseconds and I cannot imagine a scenario where it takes
5 minutes, except if the whole machine (OS/hardware) is in
deep trouble anyway.

Harald.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Newbie: perl program in a ksh here-document Jose Luis Perl Misc 3 08-13-2009 06:42 AM
ruby and ksh ClassRubyExceptionHandline Ruby 4 09-09-2006 01:37 PM
calling ksh script from python ronan_boisard@yahoo.com Python 12 06-08-2005 09:07 AM
how do I access ksh environment array variables in perl? Andy Haupt Perl Misc 1 03-24-2005 07:26 PM
passing string to ksh with system command joez3@yahoo.com Perl Misc 4 12-10-2004 06:14 AM



Advertisments