Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > 'ArrayList' allowing more than 2G entries.

Reply
Thread Tools

'ArrayList' allowing more than 2G entries.

 
 
Tom McGlynn
Guest
Posts: n/a
 
      06-18-2009
Occasionally I need to store information in arrays which can get
larger than 2G. Since I can't make an array that long, I'd like to
have a backup in these large cases where I use something that looks
like an ArrayList except that it allows >2G entries. So it won't be a
real ArrayList, e.g., size() will return a long, but should look very
similar. This isn't too difficult to implement, but I was curious if
anyone knew of existing libraries that might already have addressed
this issue.

Thanks,
Tom McGlynn

 
Reply With Quote
 
 
 
 
Lord Zoltar
Guest
Posts: n/a
 
      06-18-2009
On Jun 18, 2:10*pm, Tom McGlynn <(E-Mail Removed)> wrote:
> Occasionally I need to store information in arrays which can get
> larger than 2G. *Since I can't make an array that long, I'd like to
> have a backup in these large cases where I use something that looks
> like an ArrayList except that it allows >2G entries. *So it won't be a
> real ArrayList, e.g., size() will return a long, but should look very
> similar. *This isn't too difficult to implement, but I was curious if
> anyone knew of existing libraries that might already have addressed
> this issue.
>
> Thanks,
> Tom McGlynn


2G... do you mean 2 billion?
 
Reply With Quote
 
 
 
 
markspace
Guest
Posts: n/a
 
      06-18-2009
Lord Zoltar wrote:

>
> 2G... do you mean 2 billion?



I'm sure from the context of his post me means greater than
Integer.MAX_VALUE.
 
Reply With Quote
 
Tom McGlynn
Guest
Posts: n/a
 
      06-19-2009
On Jun 18, 4:50 pm, Patricia Shanahan <(E-Mail Removed)> wrote:
> Tom McGlynn wrote:
> > Occasionally I need to store information in arrays which can get
> > larger than 2G. Since I can't make an array that long, I'd like to
> > have a backup in these large cases where I use something that looks
> > like an ArrayList except that it allows >2G entries. So it won't be a
> > real ArrayList, e.g., size() will return a long, but should look very
> > similar. This isn't too difficult to implement, but I was curious if
> > anyone knew of existing libraries that might already have addressed
> > this issue.

>
> If you decide to do something ArrayList-like, be careful about the data
> structure design and algorithms. ArrayList's algorithms for insert and
> delete may not be suitable.
>
> If you do not need ArrayList-like properties, just an array, consider a
> memory mapped temporary file.
>
> Patricia


You're right that I'm mostly interested in the array aspects. The
sizes rarely change once the arrays are allocated (or I'd have been
using ArrayList for the more tractable sizes too). I had wanted to
emulate the ArrayList interface so that the code would seem idiomatic
Java but I wouldn't typically be resizing. Essentially
all I really need is:

LongList ll = new LongList(longSize);
val = ll.get();
ll.put(longIndex, val);

With regard to using temporary files: Curiously -- I've never gotten
a decent explanation from the Sysadmins on this -- the work systems I
mostly run on have very small /tmp areas (~500 MB) so I often have
much more space in memory than in the temporary file area. Most of
the time I could ask the user where to put the backing file though...
Spud seems to indicate that that won't really work anyway. I'll take
a look.

With regard to some of the comments about the size of all of this:
Generally these are arrays of primitives or simple FlyWeights where
the state of the object is externalized to one or two primitive
arrays. So the number of actual Java objects that needs to be created
can be quite small and I can run into limits even when the total
storage requirement only slightly exceeds 2 GB. As the commentators
note though,it's esily possible for my requirements to exceed the
memory sizes that I can expect a machine to have. True enough but I
do want to be able to use the memory I do have. Machines with <~ 10 GB
physical memory aren't too hard to come by. It looks like 4 GB is now
pretty standard for a $500 machine at BestBuy.

The best solution for me would be for Java 7 to come out with long
array indices -- but I don't believe that's in the cards.

Regards,
Tom McGlynn
 
Reply With Quote
 
Andreas Leitgeb
Guest
Posts: n/a
 
      06-19-2009
Tom McGlynn <(E-Mail Removed)> wrote:
> Essentially all I really need is:
> LongList ll = new LongList(longSize);
> val = ll.get();
> ll.put(longIndex, val);


If all you need is put and get, then make a small class that
contains an array of arrays, and use the upper n bits for
indexing the outer array and the lower n bits on indexing
the inner ones. (lazily creating inner arrays on "put" only)

I would make the size (2^n) of the arrays much smaller than
2GB: perhaps even just 32Mill (2^25) items or so. That
way you don't waste too much on allocating only full inner
arrays, and from your description, the resulting 50 bits
maximum size may still be large enough (a million billion
Items) for practical purposes on machines in foreseeable
future. (no "640kB is large enough" reference here)

PS: I think such a class is faster self-written, reviewed and
tested, than a foreign library evaluated for usability.

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      06-19-2009
On 19.06.2009 15:21, Tom McGlynn wrote:
> You're right that I'm mostly interested in the array aspects. The
> sizes rarely change once the arrays are allocated (or I'd have been
> using ArrayList for the more tractable sizes too). I had wanted to
> emulate the ArrayList interface so that the code would seem idiomatic
> Java but I wouldn't typically be resizing. Essentially
> all I really need is:
>
> LongList ll = new LongList(longSize);
> val = ll.get();
> ll.put(longIndex, val);


Do you really need to fill the array to that size or do you need the
largish array in order to allow for long indexes? In the latter case
with sparse filling you might simply use a Map<Long,YourType>.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
Martin Gregorie
Guest
Posts: n/a
 
      06-19-2009
On Fri, 19 Jun 2009 06:21:53 -0700, Tom McGlynn wrote:

> You're right that I'm mostly interested in the array aspects. The sizes
> rarely change once the arrays are allocated (or I'd have been using
> ArrayList for the more tractable sizes too). I had wanted to emulate
> the ArrayList interface so that the code would seem idiomatic Java but I
> wouldn't typically be resizing. Essentially all I really need is:
>
> LongList ll = new LongList(longSize); val = ll.get();
> ll.put(longIndex, val);
>

Some questions:
- when you say a 2G array, is that the number of elements or
the total data size?
- do the elements have a key?
- is the index also a 'key' and if so, how sparse is the array?
- is the array always guaranteed to fit into the process memory
without paging?


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
 
Reply With Quote
 
markspace
Guest
Posts: n/a
 
      06-19-2009
Robert Klemme wrote:

> Do you really need to fill the array to that size or do you need the
> largish array in order to allow for long indexes? In the latter case
> with sparse filling you might simply use a Map<Long,YourType>.


Interesting idea! Untested:


public class LongList<T> {

private static final int BUF_SIZE = 2048;

private HashMap<Long,T[]> store = new HashMap<Long,T[]>();

public T get( long index ) {
T[] buf = store.get(index/BUF_SIZE);
if( buf == null ) {
return null;
}
return buf[ (int)(index % BUF_SIZE) ];
}

public void put( long index, T value ) {
T[] buf = store.get(index/BUF_SIZE);
if( buf == null ) {
buf = (T[]) new Object[BUF_SIZE];
store.put( index/BUF_SIZE, buf);
}
buf[(int)(index%BUF_SIZE)] = value;
}

}
 
Reply With Quote
 
Tom McGlynn
Guest
Posts: n/a
 
      06-19-2009
On Jun 19, 12:48 pm, Martin Gregorie <(E-Mail Removed)>
wrote:
> On Fri, 19 Jun 2009 06:21:53 -0700, Tom McGlynn wrote:
> > You're right that I'm mostly interested in the array aspects. The sizes
> > rarely change once the arrays are allocated (or I'd have been using
> > ArrayList for the more tractable sizes too). I had wanted to emulate
> > the ArrayList interface so that the code would seem idiomatic Java but I
> > wouldn't typically be resizing. Essentially all I really need is:

>
> > LongList ll = new LongList(longSize); val = ll.get();
> > ll.put(longIndex, val);

>
> Some questions:
> - when you say a 2G array, is that the number of elements or
> the total data size?


The number of elements.
> - do the elements have a key?


I'm not quite sure what you mean by this... Generally I'm only
interested accessing the data through the index.

> - is the index also a 'key' and if so, how sparse is the array?


The arrays are dense, i.e., I will have an element for every index
less than the maximum.

> - is the array always guaranteed to fit into the process memory
> without paging?
>


Hmmm... In the underlying system no. I'm basically writing code to
interpret a particular file format and the format makes no real limit
on the size of the files. However I support both streaming and in-
memory access access to the data. At some point streaming will be
required for all users. Right now for many users that point is being
set because certain arrays are reaching the 2 GB limit even though a
larger array would fit within their memory could it be constructed.
The in-memory modes are much easier for the user and more flexible.
So in the context of what I'm trying to do it may be a reasonable
assumption that paging is not an issue. If their system thrashes for
in-memory access, then they've passed the limit where they need to go
to the streaming mode.


Regards,
Tom McGlynn
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-19-2009
On Thu, 18 Jun 2009 11:10:02 -0700 (PDT), Tom McGlynn
<(E-Mail Removed)> wrote, quoted or indirectly quoted
someone who said :

>Occasionally I need to store information in arrays which can get
>larger than 2G. Since I can't make an array that long, I'd like to
>have a backup in these large cases where I use something that looks
>like an ArrayList except that it allows >2G entries. So it won't be a
>real ArrayList, e.g., size() will return a long, but should look very
>similar. This isn't too difficult to implement, but I was curious if
>anyone knew of existing libraries that might already have addressed
>this issue.


If you are doing any inserting and deleting, the slides on a giant
64-bit JNI array would be prohibitive. I thought of using a HashMap,
but it is just array inside too, and would have the same size limit
problems.

For this, you need to figure out exactly which properties of ArrayList
you need, then look for a minimal Collection that satisfies those
needs. You many not need all the bells of an long ArrayList.

Even Collection has the int limit on size(). Looks like Sun was not
thinking ahead to 64-bit Collections.

I suspect you will end up cooking up your own that is implemented
inside with [][]

It really rather odd to have a 64-bit JVM without a 64-bit indexed
array.
--
Roedy Green Canadian Mind Products
http://mindprod.com

If everyone lived the way people do in Vancouver, we would need three more entire planets to support us.
~ Guy Dauncey
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Like all great travelers, I have seen more than I remember andremember more than I have seen. shenrilaa@gmail.com Java 0 03-06-2008 08:11 AM
Like all great travelers, I have seen more than I remember andremember more than I have seen. shenrilaa@gmail.com C++ 0 03-05-2008 08:41 AM
Like all great travelers, I have seen more than I remember andremember more than I have seen. shenrilaa@gmail.com C Programming 0 03-05-2008 03:26 AM
<script> allowing more than one URI reference in SRC attribute? Axel Dahmen HTML 2 10-25-2007 12:18 PM
Text box allowing language other than English? siliconmike HTML 3 01-31-2005 08:33 AM



Advertisments