Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Creating a byte[] of long size

Reply
Thread Tools

Creating a byte[] of long size

 
 
Eric Sosman
Guest
Posts: n/a
 
      07-09-2010
On 7/9/2010 4:06 PM, Daniel Pitts wrote:
>
> It is my opinion that (given perfect hindsight), the Collections API
> should have included several interfaces for comparing items, not just one.
>
> interface Ordering<T> {
> enum Order {
> LESSER, EQUAL, GREATER
> }
>
> Order compare(T left, T right);
> }


The only difference I see between this and Comparator<T> is
the use of the enum instead of an int. The advantages are not
immediately obvious. Meanwhile, instead of `comp(o1,o2) <= 0' one
would write `comp(o1,o2) != Order.GREATER', which reads even less
smoothly than `! (comp(o1,o2) > 0'.

> interface Hasher<T> {
> long hash(T value);
> }


A 64-bit hashCode() would be of little use until you got to
more than 2^32 hash buckets. Just saying.

> interface Equivalence<T> {
> boolean equal(T left, T right);
> }


I don't get it: Why not just use equals()? I guess a class
could choose not to implement Equivalence at all (and thus make
itself unusable in whatever framework relies on Equivalence), but
is that an advantage? Also, you could get a compile-time error
instead of a run-time `false' for trying to call equal() on
references of dissimilar classes; again, where's the benefit?

> Then, all the appropriate Collection code could use those interfaces.
> There should also be the obvious default implementations.


It might be helpful to give some examples of the "appropriate"
uses, and of the "obvious" defaults. For example, how does a HashMap
make use of a key that implements Hasher? Does it reflect on each
key its given and make a run-time choice between using hash() and
hashCode()? I don't get it ...

> Not to mention that iterators should have separate methods for advancing
> and retrieving,


I guess this would allow `iter.advance(666666)', with a possible
savings if the Iterator were backed by a disk file or something, where
you could just skip over two thirds of a million items without looking
at them. But, but, but!!! this only makes sense for a sequential data
structure like a List -- and for List you've already *got* access by
index without using an Iterator at all! I'm beginning to sound like a
broken record[*], I know, but what's the benefit?

> and options for non-fail-fast for certain kinds of
> collections (linked lists shouldn't invalidate any iterator unless the
> item itself is removed).


Not so sure about that. The notion of "what's the next item?"
can certainly change if things are added or removed "near" the point
where an Iterator is active. Or, in the `iter.advance(666666)' case,
you may find yourself at an unexpected place if somebody else has
inserted 555555 items between where you are and where you're going ...

See also ListIterator.
[*] How many programmers nowadays have ever heard a broken record?
Or dialed a telephone, or seen a core? Gad, I'm feeling as old as
Methuselah.[**]

[**] Before your time, laddie, before your time.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      07-09-2010
On Thu, 8 Jul 2010 23:22:03 +0100, "Boris Punk" <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>
>Is there no BigList/BigHash in Java?


The tricky thing is, you have no giant arrays to use in
implementation.

If you want BigHash, I could write you one for a fee. It would use
arrays of arrays for addressing inside.
--
Roedy Green Canadian Mind Products
http://mindprod.com

You encapsulate not just to save typing, but more importantly, to make it easy and safe to change the code later, since you then need change the logic in only one place. Without it, you might fail to change the logic in all the places it occurs.
 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      07-09-2010
On Fri, 09 Jul 2010 08:49:27 -0400, Eric Sosman
<(E-Mail Removed)> wrote, quoted or indirectly quoted
someone who said :

>> Arrays can only be indexed by ints, not longs. Even if they were,
>> even Bill Gates could not afford enough RAM for an array of bytes, one
>> for each possible long.

>
> True, but not especially relevant: You'll hit the int limit long
>before running out of dollars. $50US will buy more RAM than a Java
>byte[] can use.


I don't think our two posts conflict.
--
Roedy Green Canadian Mind Products
http://mindprod.com

You encapsulate not just to save typing, but more importantly, to make it easy and safe to change the code later, since you then need change the logic in only one place. Without it, you might fail to change the logic in all the places it occurs.
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-10-2010
On 09-07-2010 02:07, Mike Schilling wrote:
> "Arne Vajhøj" <(E-Mail Removed)> wrote in message
> news:4c366580$0$280$(E-Mail Removed)...
>> On 08-07-2010 17:15, Lew wrote:
>>> From the JLS, which I strongly urge you to study:

>>
>> Unless the poster has a solid programming experience,
>> then the JLS may not be the best to study.
>>
>> Sure it is by definition correct,

>
> mod typos and misstatements, of course.


What counts: "what is written in the spec" or "what
should have been written in the spec" ?

Arne
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-10-2010
On 08-07-2010 22:40, Wayne wrote:
> On 7/8/2010 5:35 PM, Boris Punk wrote:
>> Integer.MAX_VALUE = 2147483647
>>
>> I might need more items than that. I probably won't, but it's nice to have
>> extensibility.

>
> To me, it is unlikely your system will run well if this one data structure
> consumes 2G of memory. (You didn't really state the application or system;
> certainly there are exceptions to the rule.) I would suggest you use a
> more flexible system, where you keep the data on storage (disk) and use
> memory as a cache. Perhaps an ArrayList of soft references would work well.
> It might even be possible in your particular case to run a daemon thread
> that pre-fetches items into the cache.


You mean reinvent what the virtual memory system is already
providing for free?

> Keep in mind a modern general-purpose computer will use virtual memory,
> typically with 4kiB pages. Any data structure larger than that will
> likely end up swapped to disk anyway.


Absolutely not.

Lots of systems does not page much even with huge data.

Arne
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-10-2010
On 09-07-2010 12:21, Wayne wrote:
> On 7/9/2010 12:31 AM, Patricia Shanahan wrote:
>> Wayne wrote:
>>> On 7/8/2010 5:35 PM, Boris Punk wrote:
>>>> Integer.MAX_VALUE = 2147483647
>>>>
>>>> I might need more items than that. I probably won't, but it's nice to have
>>>> extensibility.
>>>
>>> To me, it is unlikely your system will run well if this one data structure
>>> consumes 2G of memory. (You didn't really state the application or system;
>>> certainly there are exceptions to the rule.) I would suggest you use a
>>> more flexible system, where you keep the data on storage (disk) and use
>>> memory as a cache. Perhaps an ArrayList of soft references would work well.
>>> It might even be possible in your particular case to run a daemon thread
>>> that pre-fetches items into the cache.

>>
>> What's the difference between one data structure occupying over 2 GB and a set of
>> data structures that use that much space?
>>
>> Certainly, given enough memory, Java can support total data structure sizes well over
>> 2 GB without excessive paging.

>
> A reduction in the number of page faults. There was an interesting article about
> this topic in this month's Communications of the ACM, by Poul-Jenning Kamp,


Poul-Henning Kamp often abbreviated PHK.

> who
> was one of the lead developers of the FreeBSD kernel.


He still contributes to FreeBSD.

> He applied his insight
> to a web proxy replacement for Squid called Varnish, and was able to replace
> 12 Squid machines with 3 Varnish ones. It used a modified binary heap he called
> a B-heap, which respected the page size of memory. The article was titled
> "You're doing It Wrong". The message I came away with was, don't ignore the
> fact that computers use paging when designing large data structures. I was
> thinking that lesson might apply to the OP's situation.


I assume you are talking about this article:

http://queue.acm.org/detail.cfm?id=1814327

He is not suggesting any custom swap to disk or anything, but just
noting that it is beneficial to keep stuff together to minimize
paging.

In what may actually be an earlier version of the same story:

http://www.version2.dk/artikel/13201...-et-moderne-os

translated:

http://translate.google.com/translat...os&sl=da&tl=en

he explicit states:

'A classic example is that you stand and move things between disk and
memory at all times. Men det gør operativsystemkernen jo selv. But it
makes the operating system kernel yourself. Man skal bare lægge det et
sted i den virtuelle hukommelse, så gør den det selv, og det er den
meget bedre til. You just put it somewhere in the virtual memory, so it
makes it even and it is much better. Men det er folk ikke klar over,«
lyder vurderingen fra Poul-Henning Kamp. But people are not aware,
"reads the assessment from Poul-Henning Kamp.

Arne


 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-10-2010
On 09-07-2010 13:24, Boris Punk wrote:
> "Kevin McMurtrie"<(E-Mail Removed)> wrote in message
>> 24GB of RAM is a standard server configuration this year. Even my
>> laptop has 8GB and can only run 64 bit Java. A Java array indexing
>> limit of 2147483647 is a growing problem, not a future problem.

> Is it not as simple as assigning int as 64 bit and long as 128 bit in newer
> versions?


No.

Lots of code would break.

Code that relies on overflow.

Code that relies on bit masks.

Arne

 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-10-2010
On 09-07-2010 13:46, Lew wrote:
> Boris Punk wrote:
>> Is it not as simple as assigning int as 64 bit and long as 128 bit in newer
>> versions

>
> That depends. For whom would it be simple, and for whom not?
>
> How much legacy code are you willing to break?
>
> What about folks who suddenly experience performance degradation
> because their existing structures suddenly don't fit in on-chip cache
> any more, whose programs are suddenly twice as large, and for whom
> memory accesses overall suddenly take twice as long?
>
> Whether they need the extra capacity or not?


That type of optimizations would break anyway over time.

Code lives a lot longer than hardware.

It is not realistic to assume identical or almost identical
performance characteristics over the lifetime of code.

One of the reasons why micro optimization is rather pointless.

> Even today, as pointed out upthread, Java is perfectly capable of
> handling data structures that exceed 2 GB in size, just not in a
> single array or int-based structure (or in 32-bit mode). You have to
> go with multi-dimensional arrays or Collection<Collection> types of
> structures, but they will work.
>
> I predict that even when the need for multi-gigabyte arrays becomes
> widely acknowledged that the actual use cases for them will remain
> quite specialized. I venture to say further that the OP doesn't need
> one yet.


I agree with that one.

Such huge arrays seems mostly a scientific computing wish.

Arne

 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-10-2010
On 09-07-2010 02:05, Mike Schilling wrote:
> "Arne Vajhøj" <(E-Mail Removed)> wrote in message
> news:4c3655fd$0$283$(E-Mail Removed)...
>> On 08-07-2010 18:22, Boris Punk wrote:
>>> Is there no BigList/BigHash in Java?

>>
>> No.
>>
>> But You can have a List<List<X>> which can then
>> store 4*10^18 X'es.
>>

>
> Or you could pretty easily build a class like
>
> public class BigArray<T>
> {
> T get(long index);
> void set(long index, T value);
> }
>
> backed by a two-dimensional array.[1] The reason I prefer to say "array"
> rather than "List" is that random access into a sparse List is a bit
> dicey, while arrays nicely let you set index 826727 even if you haven't
> touched any of the earlier indices yet, and will tell you that the entry
> at 6120584 is null, instead of throwing an exception.


Sure.

But the collections has the advantage of dynamic size.

I think the general trend is from arrays to collections.

With scientific computing as a major exception.

Arne
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      07-10-2010
On 09-07-2010 10:31, Patricia Shanahan wrote:
> On 7/9/2010 5:15 AM, Eric Sosman wrote:
>> On 7/8/2010 9:11 PM, Patricia Shanahan wrote:
>>> Arne Vajhøj wrote:
>>>> On 08-07-2010 17:35, Boris Punk wrote:
>>>>> Integer.MAX_VALUE = 2147483647
>>>>>
>>>>> I might need more items than that. I probably won't, but it's nice to
>>>>> have
>>>>> extensibility.
>>>>
>>>> It is a lot of data.
>>>>
>>>> I think you should assume YAGNI.
>>>
>>> Historically, each memory size has gone through a sequence of stages:
>>>
>>> 1. Nobody will ever need more than X bytes.
>>>
>>> 2. Some people do need to run multiple jobs that need a total of more
>>> than X bytes, but no one job could possibly need that much.
>>>
>>> 3. Some jobs do need more than X bytes, but no one data structure could
>>> possibly need that much.
>>>
>>> 4. Some data structures do need more than X bytes.
>>>
>>> Any particular reason to believe 32 bit addressing will stick at stage
>>> 3, and not follow the normal progression to stage 4?

>>
>> None. But Java's int isn't going to grow wider, nor will the
>> type of an array's .length suddenly become non-int; too much code
>> would break. When Java reaches the 31-bit wall, I doubt it will
>> find any convenient door; Java's descendants may pass through, but
>> I think Java will remain stuck on this side.
>>
>> In ten years, we'll all have jobs converting "legacy Java code"
>> to Sumatra.

>
> I don't think the future for Java is anywhere near as bleak as you paint
> it.
>
> The whole collections issue could be handled by creating a parallel
> hierarchy based on java.util.long_collections (or something similar for
> those who don't like separating words in package names). It would
> replicate the class names in the java.util hierarchy, but with long
> replacing int wherever necessary to remove the size limits. It could be
> implemented, using arrays of arrays where necessary, without any JVM
> changes.
>
> To migrate a program to the new collections one would first change the
> import statements to pick up the new packages, and then review all int
> declarations to see if they should be long. Many of the ones that need
> changing would show up as errors.


Collections is certainly solvable.

> Arrays are a worse problem, requiring JVM changes. The size field
> associated with an array would have to be long. There would also need to
> be a new "field" longLength. Attempts to use arrayRef.length for an
> array with more that Integer.MAX_VALUE elements would throw an
> exception. arrayRef.length would continue to work for small arrays for
> backwards compatibility.
>
> I suspect Eclipse would have "Source -> Long Structures" soon after the
> first release supporting this, and long before most programs would need
> to migrate.


It is not a perfect solution.

When calling a library some arrays would have to be marked
as @SmallArray to indicate that you can not call with a
big array, because the method calls length.

There may be other problems that I can not think of.

Arne

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Having compilation error: no match for call to ‘(const __gnu_cxx::hash<long long int>) (const long long int&)’ veryhotsausage C++ 1 07-04-2008 05:41 PM
Use of Long and Long Long Bart C C Programming 27 01-15-2008 05:27 AM
long long and long Mathieu Dutour C Programming 4 07-24-2007 11:15 AM
unsigned long long int to long double Daniel Rudy C Programming 5 09-20-2005 02:37 AM
Assigning unsigned long to unsigned long long George Marsaglia C Programming 1 07-08-2003 05:16 PM



Advertisments