# New MS Ram diagnostic tool..

Discussion in 'NZ Computing' started by Greg Townsend, Jul 31, 2003.

1. ### Ben PerstonGuest

Jay wrote:
> Ben Perston wrote:
>
>
>>Jay wrote:
>>
>>>Ben Perston wrote:

>>
>>>>What do you get if you calculate the probability of this diagnostic
>>>>(taking 50k of memory) failing to spot an error on say a system with 256
>>>>MB? I.e. that the dodgy bit will be used by the programme and that it
>>>>will cause this specific scenario?
>>>
>>>
>>>A probability greater than zero.

>>
>>Any idea of its magnitude?
>>
>>Other events with probabilities greater than zero include the failure of
>>an otherwise perfect memory tester due to cosmic ray strike, after all...

>
>
> It depends on the RAM failure rate.
> What is the RAM failure rate?

Well, what makes RAM fail? I don't know very much about how RAM works.
Does a faulty RAM module usually have some bits that have a much
higher probability than the rest of failing, or just an overall
probability for any event to go wrong that's high enough to be observed?
If the former I think your problem must be irrelevant...

> If it is 'r' bits per second then the probability for a 100k program in 32MB
> memory running for t seconds would be r * t * 0.1/32 wouldn't it?

Well, 0.098... /32, but you're close enough. You have missed the
proportion of errors that don't cause the programme to fail in the
manner Nicholas described as another factor. Any idea what that would be?

> Can't you figure that out?
> Or is multiplication a bit too difficult for you?

I was just wondering if you could produce any estimates based on a
decent understanding of how RAM works as to whether the problem you're
talking about is genuine or something that would not be observed in any
practical experiment.

Ben Perston, Aug 6, 2003

2. ### JayGuest

Ben Perston wrote:

> Jay wrote:
>> Ben Perston wrote:
>>
>>
>>>Jay wrote:
>>>
>>>>Ben Perston wrote:
>>>
>>>>>What do you get if you calculate the probability of this diagnostic
>>>>>(taking 50k of memory) failing to spot an error on say a system with
>>>>>256
>>>>>MB? I.e. that the dodgy bit will be used by the programme and that it
>>>>>will cause this specific scenario?
>>>>
>>>>
>>>>A probability greater than zero.
>>>
>>>Any idea of its magnitude?
>>>
>>>Other events with probabilities greater than zero include the failure of
>>>an otherwise perfect memory tester due to cosmic ray strike, after all...

>>
>>
>> It depends on the RAM failure rate.
>> What is the RAM failure rate?

>
> Well, what makes RAM fail? I don't know very much about how RAM works.
> Does a faulty RAM module usually have some bits that have a much
> higher probability than the rest of failing, or just an overall
> probability for any event to go wrong that's high enough to be observed?
> If the former I think your problem must be irrelevant...

Single bit errors are the most common.

>
>> If it is 'r' bits per second then the probability for a 100k program in
>> 32MB memory running for t seconds would be r * t * 0.1/32 wouldn't it?

>
> Well, 0.098... /32, but you're close enough. You have missed the
> proportion of errors that don't cause the programme to fail in the
> manner Nicholas described as another factor. Any idea what that would be?

Simple branch instructions are one of the most common.
I would estimate they amount to about 7% of all instructions
in a typical program. They might be short (2 byte) or longer.
So another factor of, say 0.07.
Are they relatively innocuous? Well the program usually expects
the branch to be taken or not taken (that is why it is a conditional
branch). To a "jc" changing into a "jnz" or even a "jo" might
be relatively harmless.

BTW my factor 0.1/32 isn't quite right because for each opcode there
are a number of 1-bit different opcodes that are similar.
Most others will probably result in a bad opcode fault because either
the mutated opcode has a different length (so following opcode is garbage)
or the opcode itself is garbage.

>> Can't you figure that out?
>> Or is multiplication a bit too difficult for you?

>
> I was just wondering if you could produce any estimates based on a
> decent understanding of how RAM works as to whether the problem you're
> talking about is genuine or something that would not be observed in any
> practical experiment.

Because there is a large number of quite different opcodes it would
take some effort to work it out. But whether a mutation into a different
opcode is relatively benign (superficially) is more difficult
to estimate.

Anyhow, the best memory testing uses known good ram or rom.

Unfortunately the best memory tests do not generate the inductive fields
and sudden dc supply surges that peripherals that are connected to
you computer can produce (including hard disks, monitors etc). Not to
mention mains earth loop effects, stray electromagnetic fields, etc.

The best way to test memory is to swap it.

Jay, Aug 6, 2003

## Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.