Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Decoding html pages

Reply
Thread Tools

Decoding html pages

 
 
Spamless
Guest
Posts: n/a
 
      10-24-2008
On 2008-10-24, Thomas 'PointedEars' Lahn <> wrote:
> Spamless wrote:
>> Thomas 'PointedEars' Lahn wrote:
>>> Spamless wrote:
>>>> On 2008-10-24, Santander <> wrote:
>>>>> just one method for javascript:
>>>>> http://code.google.com/p/turbojs/wiki/ClosedSourceJS
>>>>>
>>>>> (I am not fully understand how it works and it requires a few dummy js files
>>>>> for few javascripts)
>>>> That isn't a javascript trick, but php blocking of access to the file.
>>>> It is a server side trick to prevent one from simply accessing the
>>>> *.js file - it has to be loaded by a "proper" page. [...]
>>> Nonsense. You are making the false assumption that accessing the generated
>>> source code requires another request. It doesn't. One does not even need
>>> Firebug to see it, although it helps.

>>
>> That is why the dummy.js file is at the end to unset the session state
>> allowing one to get the "protected Javascript" (it would be pointless to
>> block it for everything for then it could not be gotten the first time) when
>> the page is started then unsetting the server side session variable. [...]

>
> The state of the server-side session does not matter at all when (*not* if)
> no further request is necessary to get at the code.


Send a request for index.html and you do NOT get the (client side) included
Javascript modules. Another GET request is required. If, somehow, when your
browser sets a
GET /index.php HTTP/1.1
Host: somesite.com
back comes the index.php *and* the (client side) included Javascript file,
then you have a magic browser.

On the other hand, with HTTP 1.1 and "Keep-alive" one does not need a new
TCP stream, that is true, on the other hand a GET request for a new page
will cause another call to the PHP engine to parse the new file.

On the fourth hand, a site could be set up so that the index.php file has
a server side, PHP inclusion of the code so that the PHP engine puts the
actual Javascript code on the index.php page rather than a client side
inclusion
[script type="text/javascript" src="js/closedsource.js"][/script]
as is used here (requiring a new GET request).

The page does not include the code on the server side (it could have
used PHP code to write the contents of some javascript file to the
page itself before sending out the HTML page, with the Javascript on
it rather than write out a [script src=...] tag).

It doesn't. It sends HTML code to have the browser load the *js file
separately. That requires another GET request.

Look ... I didn't write it. Send a note to google telling them that their
programmers are incompetent. Go to
http://code.google.com/p/turbojs/wiki/ClosedSourceJS
and use the link on the bottom to add a comment on how wrong their code is.
 
Reply With Quote
 
 
 
 
Spamless
Guest
Posts: n/a
 
      10-24-2008
On 2008-10-24, Spamless <> wrote:
> On 2008-10-24, Santander <> wrote:
>> just one method for javascript:
>> http://code.google.com/p/turbojs/wiki/ClosedSourceJS
>>
>> (I am not fully understand how it works and it requires a few dummy js files
>> for few javascripts)

>
> That isn't a javascript trick, but php blocking of access to the file.
> It is a server side trick to prevent one from simply accessing the
> *.js file - it has to be loaded by a "proper" page.


Let me write up something in a bit more detail (as this is not Javascript
but PHP) to indicate what the google example shows.

It is *not* Javascript one can use on one's pages. It is a PHP method one
can use on a server with support for embedded PHP code.

To get around the stateless nature of HTML one can associate a web session
with a server side state saved in some variable, an array, say $_SESSION. To
recognize that a visitor is getting a new page in this same web session one
may set a cookie, say PHPSESSID, to a random variable with short TTL (or set
as a session cookie) (reset and update the TTL as other pages are loaded).
The session variable, an associative array (hash) or object may have
readable and writeable values associated to keys or properties.

Using PHP, one can set the web server NOT to send off the file index.html
when a request for index.html arrives, but instead send the index.html page
to the PHP engine along with the session data/variable/object/array and let
that programme return data to the web server which passes it along to the
visitor's browser as the content returned for the request for "index.html."
In that case, the index.html file need not be pure HTML but has to be
something the PHP engine understands and can use to create valid HTML to
pass along to the web server to pass along to the visitor.

For example, the PHP engine might check for a variable named image_count in
the session data. If not there, set it to zero and add it to the session
data. Next, return as the first part of the HTML code it generates an
[img src=...] tag to display the first banner if image_count is 0, the
second banner if image_count is 1, ... the fifth banner if image_count is 4.
Have it check image_count and if it is zero write out the HTML content for
the first advertisement, if it is one write out the HTML content for the
second ad, etc. and finally increase image_count by 1 mod 5.

If you visit the site you see the first banner and ad. Reload the same page
and see the second. Reload the same page and see the third, etc. You get
different results sending the exact same request data to the same server for
the same page/URL (but never see the original, unchanging "raw" file).

You never see the raw index.html *file* with its embedded PHP code but only
the HTML code that the PHP engine produces *from* the raw html code and that
returned HTML code depends on the current state/session data. This is all
done server side and the visitor does not see the server-side state data
which determines which page he/she gets.

Since this is no longer a stateless connection, one can use the state data
in the session variables/session array/session object to change responses or
access depending upon the state. The use of PHP and tracking the
session/state to allow or block access to a (in this case Javascript) file
is what the example at google provides.

The page at http://code.google.com/p/turbojs/wiki/ClosedSourceJS shows a way
to block access to a Javascript file depending on the state and how to set
and unset state on a page.

To block access to a Javascript file except when a page using it is loading
one can set a session variable to allow loading the Javascript page at the
top of the page and have it reset after the page has loaded. One cannot
simply reset it at the end using PHP code embedded in the (PHP parsed) raw
HTML page itself because it would be reset when the HTML page is first
parsed by the PHP engine, before it is even sent out and before the
Javascript has had a chance to load, not after it has loaded, so one has to
have something accessed after the Javascript has been loaded and have the
PHP engine reset the accessibility variable when that item is accessed.

These are PHP session variables and not Javascript. They enable stateful
data to be used for a web session. They can and are used for lots of things.

The code at http://code.google.com/p/turbojs/wiki/ClosedSourceJS uses a PHP
session variable (the key or property of the $_SESSION associative
array/object, 'js_turbo01') as the Javascript accessibility variable. The
first thing the sample shows is the command to have the webserver *not*
simply send a visitor's browser a *.js file but instead pass it along to the
PHP engine to parse it and return the results presented by the php engine
AddType application/x-httpd-php .js

At the start of the "proper page" which uses the Javascript, the
accessibility variable is set ("show" is the value used to indicate it is
set to allow access) and at the end of the page an item to unset the
accessibility variable (when it is accessed and parsed by the PHP engine) is
added (the "dummy.js" file in the sample at
http://code.google.com/p/turbojs/wiki/ClosedSourceJS).

When one tries to access the "protected" code, closedsource.js, in the
example, the raw original file is not returned but it is passed along to the
PHP engine which checks the state (is access allowed?) and if so it returns
the real code and if not it may return something else (in the sample shown,
it just returns text indicating an error but the PROTIP at the bottom
suggests returning code different from the "real" code when accessed without
the accessibility variable properly set).
 
Reply With Quote
 
 
 
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      10-24-2008
Spamless wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Spamless wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Spamless wrote:
>>>>> On 2008-10-24, Santander <> wrote:
>>>>>> just one method for javascript:
>>>>>> http://code.google.com/p/turbojs/wiki/ClosedSourceJS
>>>>>>
>>>>>> (I am not fully understand how it works and it requires a few dummy js files
>>>>>> for few javascripts)
>>>>> That isn't a javascript trick, but php blocking of access to the file.
>>>>> It is a server side trick to prevent one from simply accessing the
>>>>> *.js file - it has to be loaded by a "proper" page. [...]
>>>> Nonsense. You are making the false assumption that accessing the generated
>>>> source code requires another request. It doesn't. One does not even need
>>>> Firebug to see it, although it helps.
>>> That is why the dummy.js file is at the end to unset the session state
>>> allowing one to get the "protected Javascript" (it would be pointless to
>>> block it for everything for then it could not be gotten the first time) when
>>> the page is started then unsetting the server side session variable. [...]

>> The state of the server-side session does not matter at all when (*not* if)
>> no further request is necessary to get at the code.

>
> Send a request for index.html and you do NOT get the (client side) included
> Javascript modules.


What part of "no *further* request" did you not get?

A script-capable Web browser will have to request the "modules" and download
them in order to compile and execute them, and so the source code in
question can be retrieved at least from the browser's cache.

> [...]
> Look ... I didn't write it.


But you are promoting it here, cluelessly.

> Send a note to google telling them that their programmers are incompetent.


That much would largely appear to be self-evident, but what you also did not
get is that this code was _not_ written by Google programmers. Google Code
is a public software repository. Incidentally, you can find Firebug, which
among other things allows you to get the source code without further request
in Firefox, there, too:

<http://code.google.com/p/fbug/>


PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>
 
Reply With Quote
 
Spamless
Guest
Posts: n/a
 
      10-25-2008
On 2008-10-24, Thomas 'PointedEars' Lahn <> wrote:
>
> A script-capable Web browser will have to request the "modules" and download

^^^^
Oh, it will require a second request?

> them in order to compile and execute them, and so the source code in
> question can be retrieved at least from the browser's cache.


In the browser's cache? That sounds familiar. I believe that was one of the
ways I suggested to recover the javascript after loading the page (along
with wget with recursion while getting the "proper page" to have the session
value set when getting the closedsource.js file, though setting it not to
accept cookies may fail - well, the same with a browser). The method does
not prevent one from getting the *.js file. However, it requires that it be
gotten while the session data is set by getting the "proper" starting page
and not after it has been reset after that page has loaded. That's all it
does. It can be effective if one knows that mycode.js is up and tries to get
it directly (without getting the "proper page") and, as the PROTIP at the
bottom of the page suggests, one gets valid Javascript but totally different
from the real code. If one knows that this trick is being used and the URL
of the starting page, it is not difficult to get the Javascript. If one does
not know that just trying to get the *.js file will return the wrong data
(e.g. malware exploits in Javascript to load a binary) and tries just to get
the exploit javascript ... oh ... an innocent site! The reports of malware
installations must be wrong!
 
Reply With Quote
 
Spamless
Guest
Posts: n/a
 
      10-25-2008
On 2008-10-24, Thomas 'PointedEars' Lahn <> wrote:

> Incidentally, you can find Firebug, which among other things allows you to
> get the source code without further request in Firefox, there, too:
>
><http://code.google.com/p/fbug/>


How well does it work? Have you seen (I have) javascript which deletes
itself from the page (or other Javascript)? It runs some code then looks
for script elements and ... uses the DOM to remove them. I've seen that a
time or two in exploit scripts (whose authors really don't want you to see
what is going on). Will firebug show the code which was on the page?
 
Reply With Quote
 
Spamless
Guest
Posts: n/a
 
      10-25-2008
On 2008-10-25, Conrad Lender <> wrote:
> On 2008-10-25 02:37, Spamless wrote:
>> How well does it work? Have you seen (I have) javascript which
>> deletes itself from the page (or other Javascript)? It runs some code
>> then looks for script elements and ... uses the DOM to remove them.
>> I've seen that a time or two in exploit scripts (whose authors really
>> don't want you to see what is going on). Will firebug show the code
>> which was on the page?

>
> Yes.
> http://groups.google.com/group/comp....5f059ad2b00e20


I think no.

It leaves a reference but does not show the code.

With an HTML page,

<html><head></head><body onload="goaway()">
<script src=go.js></script>
All gone!
</body></html>

and a javascript file, go.js

function goaway() {
alert("Be GONE!");
}


loading the page and then running firebug to examine it shows the
source and even the code from the loaded Javascript file
(there is a "+" sign next to the
<body onload="goaway()">
section which can be used to expand it and find the
the <script src=go.js></script> inclusion which can
be expanded to show the source creating the alert box).


Changing go.js to

function goaway() {
alert("Be GONE!");
togo=document.body.childNodes[1];
document.body.removeChild(togo);
}


and reloading the page removes
<script src=go.js></script>
from the DOM (as one can see by using the DOM tool
(TOOLS|DOM_INSPECTOR).

That is now gone (the + sign next to the <body> tag
is gone - no expansion to find the source file or
the code creating the alert box).

Actually, opening firebug FIRST and loading the page
apparently shows the code (while the alert box is on
screen and before the javascript section is removed from
the page) (the plus sign next to the body tag is there
for me to expand to show the code - if I could, but the
alert box is modal and I can't expand it to see the code
until I close the alert box) but as soon as I close the
alert box, the plus sign indicating that I can get to
the javascript code disappears and the code is not
available. Perhaps you get a different result and firebug
does show you the javascript for the alert box for
the second version of go.js.
 
Reply With Quote
 
Gregor Kofler
Guest
Posts: n/a
 
      10-25-2008
Spamless meinte:

[lenghty explanation snipped]

> You never see the raw index.html *file* with its embedded PHP code but only
> the HTML code that the PHP engine produces *from* the raw html code and that
> returned HTML code depends on the current state/session data. This is all
> done server side and the visitor does not see the server-side state data
> which determines which page he/she gets.


So what? That's the case with practically any PHP "page" (or any by a
server-side script generated page for that matter).

[snip]

It's all very simple and "standard": One can prevent to get direct
access to the ressources on the server (dynamically generating images -
think CAPTCHA, PDFs in "hidden" directories, etc.). However, once it is
delivered to the client, it's there. Fully inspectable. So what's this
whole discussion about?

Gregor
 
Reply With Quote
 
Gregor Kofler
Guest
Posts: n/a
 
      10-25-2008
Spamless meinte:

> With an HTML page,
>
> <html><head></head><body onload="goaway()">
> <script src=go.js></script>
> All gone!
> </body></html>


> function goaway() {
> alert("Be GONE!");
> togo=document.body.childNodes[1];
> document.body.removeChild(togo);
> }
>
> Actually, opening firebug FIRST and loading the page
> apparently shows the code (while the alert box is on
> screen and before the javascript section is removed from
> the page) (the plus sign next to the body tag is there
> for me to expand to show the code - if I could, but the
> alert box is modal and I can't expand it to see the code
> until I close the alert box) but as soon as I close the
> alert box, the plus sign indicating that I can get to
> the javascript code disappears and the code is not
> available. Perhaps you get a different result and firebug
> does show you the javascript for the alert box for
> the second version of go.js.


Have a breakpoint at the first line of goaway() and reload the page. Doh!

Gregor
 
Reply With Quote
 
Spamless
Guest
Posts: n/a
 
      10-25-2008
On 2008-10-25, Gregor Kofler <> wrote:
> Spamless meinte:
>
> [lenghty explanation snipped]
>
>> You never see the raw index.html *file* with its embedded PHP code but only
>> the HTML code that the PHP engine produces *from* the raw html code and that
>> returned HTML code depends on the current state/session data. This is all
>> done server side and the visitor does not see the server-side state data
>> which determines which page he/she gets.

>
> So what? That's the case with practically any PHP "page" (or any by a
> server-side script generated page for that matter).


True, but this is a Javascript group and at least the person who saw the
original file knew some Javascript but apparently did not recognize how the
embedded PHP code works. It was intended to be an elementary explanation.

The closedsource code presented at
http://code.google.com/p/turbojs/wiki/ClosedSourceJS
was simply to prevent one from getting the *.js file except when the "proper
page" is loaded, to prevent someone from just harvesting *.js files (and if
they try, to be able to give them bogus script and they may not realize that
it isn't the real code used).

Of course the script does load, when you load the proper page (else it would
be pointless) and you do have it - somewhere - in your browser's cache, for
example though the "[script src ...]" might have been removed from the page
using the DOM and does not appear in firebug so you don't have the file name
- but ... in firefox, View|Page_Source still shows that inclusion and the
file name for searching the cache.

Tell someone that the ineteresting script is at
http://someplace.com/interesting.js
and they attempt to get it without knowing a page URL which can/must be used
actually to get the code and they may find a totally different script.

If you know that you have to load a particular HTML page to get a particular
script, you can load the HTML page to get the script (or have to load a
particular image or have seen a particular ad or ...) you can do it. It does
put limits (of which the remove visitor is unaware) on how/when a particular
script can be accessed.
 
Reply With Quote
 
Spamless
Guest
Posts: n/a
 
      10-25-2008
On 2008-10-25, Gregor Kofler <> wrote:
> Spamless meinte:
>
>> With an HTML page,
>>
>> <html><head></head><body onload="goaway()">
>> <script src=go.js></script>
>> All gone!
>> </body></html>

>
>> function goaway() {
>> alert("Be GONE!");
>> togo=document.body.childNodes[1];
>> document.body.removeChild(togo);
>> }
>>
>> Actually, opening firebug FIRST and loading the page
>> apparently shows the code (while the alert box is on
>> screen and before the javascript section is removed from
>> the page) (the plus sign next to the body tag is there
>> for me to expand to show the code - if I could, but the
>> alert box is modal and I can't expand it to see the code
>> until I close the alert box) but as soon as I close the
>> alert box, the plus sign indicating that I can get to
>> the javascript code disappears and the code is not
>> available. Perhaps you get a different result and firebug
>> does show you the javascript for the alert box for
>> the second version of go.js.

>
> Have a breakpoint at the first line of goaway() and reload the page. Doh!


Ahem ... if this is on a remote server where you cannot modify the script
then it seems to vanish. Of course, one could find the script name, not in
firebug, but using View|Page_Source and go get the script go.js directly,
unless something like the code at
http://code.google.com/p/turbojs/wiki/ClosedSourceJS
is used to force you to get the including page in order to get the actual
script ... or check your browser's cache for the script and modify it
to pause or stop before removing itself. If you have the script and want to
examine it, however, you could just load it into notepad instead of loading
the page after modifying the script so it would be seen in firebug after
loading the page which includes it. Of course, if the script is obfuscated
and you don't notice that when loaded it checks the current URL and if it is
a "file:" protocol (used for local pages) redirects to about:_blank (that
has been done) you may wonder why the local copy does not work though you
could set up a local web server and access that at 127.0.0.1 (locally) so
you can use an http: protocol through the local web server or ...

Nothing can prevent your access to code that your browser gets (after all, it
gets it - I usually use tcpdump, a packet capture, and tcpflow to extract
the data from the TCP streams rather than later check my cache [I use linux]
but a port, windump I think, is available - open source/free) but various
tricks can be used to limit it or misdirect you if you are unaware of them.

They are tricks and if you know their secrets, they can be gotten around.
Blocking access except when loading a certain page, removing references and
code from a page, blocking the use of local copies - they are all just
tricks and the original question (well, after the decoding of the original
encoded page that started it all - the question as to what the
ClosedSourceJS material at google did and how it worked) just pointed to
another trick that can be used (and of which one should be aware if one is
getting script which doesn't appear to be what it should be).
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ HTMl Encoding/Decoding Library tushar.saxena@gmail.com C++ 5 12-05-2007 12:12 AM
HTML Decoding Uriah Piddle ASP .Net 2 01-08-2007 03:22 PM
decoding html in java IgorD Java 2 01-20-2006 09:27 PM
valide html - Encoding/Decoding rabby Python 2 12-20-2005 07:30 AM
decoding html ulrice jardin Python 1 07-22-2005 05:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57