Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   ASP General (http://www.velocityreviews.com/forums/f65-asp-general.html)
-   -   Extracting data from a document (http://www.velocityreviews.com/forums/t804297-extracting-data-from-a-document.html)

GTN170777 06-27-2008 11:00 AM

Extracting data from a document
 
Hi Guys,

Not a problem with my code, but something I would like to add, (ASP
VBScript) at the moment I have a form where a user uploads their details
including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
folder on the server with the address being stored in the database and I'm
tracking the user id through sessions.

What I would like to do after the upload is redirect to a blank page, where
some script extracts the data from the document and inserts it into another
field on the database associated with the user id, I think this may be called
parsing, but I'm at a complete loss, I don't suppose you guys have any ideas
on this do you.

I think this would probably make a really neat little extension also...

Look forward to your responses.

G

Mike Brind [MVP] 06-27-2008 01:32 PM

Re: Extracting data from a document
 

"GTN170777" <GTN170777@discussions.microsoft.com> wrote in message
news:4D3C2998-DBCB-4406-8803-93DC1905F2C1@microsoft.com...
> Hi Guys,
>
> Not a problem with my code, but something I would like to add, (ASP
> VBScript) at the moment I have a form where a user uploads their details
> including a document (Doc, PDF, TXT, Docx) The document is uploaded to a
> folder on the server with the address being stored in the database and I'm
> tracking the user id through sessions.
>
> What I would like to do after the upload is redirect to a blank page,
> where
> some script extracts the data from the document and inserts it into
> another
> field on the database associated with the user id, I think this may be
> called
> parsing, but I'm at a complete loss, I don't suppose you guys have any
> ideas
> on this do you.
>
> I think this would probably make a really neat little extension also...
>


Given that Classic ASP is no longer being developed, you are unlikely to get
MS to consider any extensions to the framework. Also, how you obtain the
contents of the file will differ enormously. A simple text file is easy.
You just use the FileSystemObject to gain access to the text. A PDF is
totally different, and there are a number of third party components
available for messing around with PDFs. Microsoft haven't even provided a
native way to deal with PDFs in the .NET framework, which is the technology
they are now devoting all their development time on. You have to dig around
for third party stuff there too.

We use a number of third party components for text parsing, and some
conditional code to identify the filetype, and then choose the component
accordingly. However, they wouldn't be of any use to you as they are
employed in a Delphi forms app.

--
Mike Brind
Micrisift MVP - ASP/ASP.NET



Old Pedant 06-27-2008 09:19 PM

RE: Extracting data from a document
 
In addition to what Mike Brind said...

You *can* use ASP/VBScript to "script" MS Word and then you can use various
scripted commands within Word to locate specific text, etc.

To say that's a pain in the neck is a gross understatement. The docs for
doing this are poor, the inherent problems manifold. [Perhaps the easiest
way to do this would be to open a document with Word and then ask to do a
"Save as..." to a ".txt" file and then parse the resultant all-text file.]

You'd probably be better off with PDF, thanks to a third party component
named "AspPDF", but be forewarned that it's not cheap and it, also, has a
pretty good learning curve needed.

You are after one of the holy grails of database developers: The ability to
do "data mining" on non-database, non-text files. And each file type has to
be approached separately, using different tools, it seems. People make good
money producing tools to do this stuff, and generally they don't sell the
tools--they just sell the [expensive] service of doing the data mining for
you.

In short, if you are a newbie programmer, this probably isn't a project you
want to try tackling, yet.


Mike Brind [MVP] 06-28-2008 08:11 PM

Re: Extracting data from a document
 

"Old Pedant" <OldPedant@discussions.microsoft.com> wrote in message
news:709B4388-F7F8-4AD3-964E-D584B23F2E04@microsoft.com...
> In addition to what Mike Brind said...
>
> You *can* use ASP/VBScript to "script" MS Word and then you can use
> various
> scripted commands within Word to locate specific text, etc.
>
> To say that's a pain in the neck is a gross understatement. The docs for
> doing this are poor, the inherent problems manifold. [Perhaps the easiest
> way to do this would be to open a document with Word and then ask to do a
> "Save as..." to a ".txt" file and then parse the resultant all-text file.]
>


The Delphi bods here use Word as a COM object and cause anything that isn't
a PDF to open in Word. That's ok on a desktop, where the user is able to
dismiss any dialogue or message boxes that might be instantiated, thus
allowing the app to close, but you can imagine what will happen if these
message boxes open on a web server (on Rack #364 in some unmanned room deep
in the bowels of some Data Centre God knows where...). That's one of the
primary reasons MS advise against automating Word in web applications.

--
Mike Brind
Microsoft MVP - ASP/ASP.NET



GTN170777 06-28-2008 10:07 PM

Re: Extracting data from a document
 
Thanks for our input guys, you've made me re think the idea!!!, I guess for
the project that we're working on it would be a nice add on.... I'm sure the
geniuses at MS will come up with something that makes the process a little
less hair pulling in a couple of years or so, and that will be the time to
add it,..... till then it's a nice add on, that can wait.

Thanks both...

GTN

"Mike Brind [MVP]" wrote:

>
> "Old Pedant" <OldPedant@discussions.microsoft.com> wrote in message
> news:709B4388-F7F8-4AD3-964E-D584B23F2E04@microsoft.com...
> > In addition to what Mike Brind said...
> >
> > You *can* use ASP/VBScript to "script" MS Word and then you can use
> > various
> > scripted commands within Word to locate specific text, etc.
> >
> > To say that's a pain in the neck is a gross understatement. The docs for
> > doing this are poor, the inherent problems manifold. [Perhaps the easiest
> > way to do this would be to open a document with Word and then ask to do a
> > "Save as..." to a ".txt" file and then parse the resultant all-text file.]
> >

>
> The Delphi bods here use Word as a COM object and cause anything that isn't
> a PDF to open in Word. That's ok on a desktop, where the user is able to
> dismiss any dialogue or message boxes that might be instantiated, thus
> allowing the app to close, but you can imagine what will happen if these
> message boxes open on a web server (on Rack #364 in some unmanned room deep
> in the bowels of some Data Centre God knows where...). That's one of the
> primary reasons MS advise against automating Word in web applications.
>
> --
> Mike Brind
> Microsoft MVP - ASP/ASP.NET
>
>
>


Bob Barrows [MVP] 06-28-2008 10:20 PM

Re: Extracting data from a document
 
GTN170777 wrote:
> Thanks for our input guys, you've made me re think the idea!!!, I
> guess for the project that we're working on it would be a nice add
> on.... I'm sure the geniuses at MS will come up with something that
> makes the process a little less hair pulling in a couple of years or
> so,

Don't count on it. They've had 30+ yrs now ...
--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"




All times are GMT. The time now is 04:29 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.