![]() |
Extracting data from a document
Hi Guys,
Not a problem with my code, but something I would like to add, (ASP VBScript) at the moment I have a form where a user uploads their details including a document (Doc, PDF, TXT, Docx) The document is uploaded to a folder on the server with the address being stored in the database and I'm tracking the user id through sessions. What I would like to do after the upload is redirect to a blank page, where some script extracts the data from the document and inserts it into another field on the database associated with the user id, I think this may be called parsing, but I'm at a complete loss, I don't suppose you guys have any ideas on this do you. I think this would probably make a really neat little extension also... Look forward to your responses. G |
Re: Extracting data from a document
"GTN170777" <GTN170777@discussions.microsoft.com> wrote in message news:4D3C2998-DBCB-4406-8803-93DC1905F2C1@microsoft.com... > Hi Guys, > > Not a problem with my code, but something I would like to add, (ASP > VBScript) at the moment I have a form where a user uploads their details > including a document (Doc, PDF, TXT, Docx) The document is uploaded to a > folder on the server with the address being stored in the database and I'm > tracking the user id through sessions. > > What I would like to do after the upload is redirect to a blank page, > where > some script extracts the data from the document and inserts it into > another > field on the database associated with the user id, I think this may be > called > parsing, but I'm at a complete loss, I don't suppose you guys have any > ideas > on this do you. > > I think this would probably make a really neat little extension also... > Given that Classic ASP is no longer being developed, you are unlikely to get MS to consider any extensions to the framework. Also, how you obtain the contents of the file will differ enormously. A simple text file is easy. You just use the FileSystemObject to gain access to the text. A PDF is totally different, and there are a number of third party components available for messing around with PDFs. Microsoft haven't even provided a native way to deal with PDFs in the .NET framework, which is the technology they are now devoting all their development time on. You have to dig around for third party stuff there too. We use a number of third party components for text parsing, and some conditional code to identify the filetype, and then choose the component accordingly. However, they wouldn't be of any use to you as they are employed in a Delphi forms app. -- Mike Brind Micrisift MVP - ASP/ASP.NET |
RE: Extracting data from a document
In addition to what Mike Brind said...
You *can* use ASP/VBScript to "script" MS Word and then you can use various scripted commands within Word to locate specific text, etc. To say that's a pain in the neck is a gross understatement. The docs for doing this are poor, the inherent problems manifold. [Perhaps the easiest way to do this would be to open a document with Word and then ask to do a "Save as..." to a ".txt" file and then parse the resultant all-text file.] You'd probably be better off with PDF, thanks to a third party component named "AspPDF", but be forewarned that it's not cheap and it, also, has a pretty good learning curve needed. You are after one of the holy grails of database developers: The ability to do "data mining" on non-database, non-text files. And each file type has to be approached separately, using different tools, it seems. People make good money producing tools to do this stuff, and generally they don't sell the tools--they just sell the [expensive] service of doing the data mining for you. In short, if you are a newbie programmer, this probably isn't a project you want to try tackling, yet. |
Re: Extracting data from a document
"Old Pedant" <OldPedant@discussions.microsoft.com> wrote in message news:709B4388-F7F8-4AD3-964E-D584B23F2E04@microsoft.com... > In addition to what Mike Brind said... > > You *can* use ASP/VBScript to "script" MS Word and then you can use > various > scripted commands within Word to locate specific text, etc. > > To say that's a pain in the neck is a gross understatement. The docs for > doing this are poor, the inherent problems manifold. [Perhaps the easiest > way to do this would be to open a document with Word and then ask to do a > "Save as..." to a ".txt" file and then parse the resultant all-text file.] > The Delphi bods here use Word as a COM object and cause anything that isn't a PDF to open in Word. That's ok on a desktop, where the user is able to dismiss any dialogue or message boxes that might be instantiated, thus allowing the app to close, but you can imagine what will happen if these message boxes open on a web server (on Rack #364 in some unmanned room deep in the bowels of some Data Centre God knows where...). That's one of the primary reasons MS advise against automating Word in web applications. -- Mike Brind Microsoft MVP - ASP/ASP.NET |
Re: Extracting data from a document
Thanks for our input guys, you've made me re think the idea!!!, I guess for
the project that we're working on it would be a nice add on.... I'm sure the geniuses at MS will come up with something that makes the process a little less hair pulling in a couple of years or so, and that will be the time to add it,..... till then it's a nice add on, that can wait. Thanks both... GTN "Mike Brind [MVP]" wrote: > > "Old Pedant" <OldPedant@discussions.microsoft.com> wrote in message > news:709B4388-F7F8-4AD3-964E-D584B23F2E04@microsoft.com... > > In addition to what Mike Brind said... > > > > You *can* use ASP/VBScript to "script" MS Word and then you can use > > various > > scripted commands within Word to locate specific text, etc. > > > > To say that's a pain in the neck is a gross understatement. The docs for > > doing this are poor, the inherent problems manifold. [Perhaps the easiest > > way to do this would be to open a document with Word and then ask to do a > > "Save as..." to a ".txt" file and then parse the resultant all-text file.] > > > > The Delphi bods here use Word as a COM object and cause anything that isn't > a PDF to open in Word. That's ok on a desktop, where the user is able to > dismiss any dialogue or message boxes that might be instantiated, thus > allowing the app to close, but you can imagine what will happen if these > message boxes open on a web server (on Rack #364 in some unmanned room deep > in the bowels of some Data Centre God knows where...). That's one of the > primary reasons MS advise against automating Word in web applications. > > -- > Mike Brind > Microsoft MVP - ASP/ASP.NET > > > |
Re: Extracting data from a document
GTN170777 wrote:
> Thanks for our input guys, you've made me re think the idea!!!, I > guess for the project that we're working on it would be a nice add > on.... I'm sure the geniuses at MS will come up with something that > makes the process a little less hair pulling in a couple of years or > so, Don't count on it. They've had 30+ yrs now ... -- Microsoft MVP - ASP/ASP.NET Please reply to the newsgroup. This email account is my spam trap so I don't check it very often. If you must reply off-line, then remove the "NO SPAM" |
| All times are GMT. The time now is 10:06 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.