Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Read in & count characters from a text file

Reply
Thread Tools

Read in & count characters from a text file

 
 
Jay Cee
Guest
Posts: n/a
 
      08-04-2007
Hi All,
Relatively new to java (ex VB) and could do with some help.
I need to read a text file character by character (can do),
and count each character as it appears, i.e
"A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
the results.

I have a few issues which I cannot seem to solve easily,
1/
I thought it would be a good idea to save the characters in a hashmap in
name-value pairs as they are read , map.put(tempStr,"1" )
I found I had to convert the character to a string before it would save to
the map.Ideally I would like to save as a character.

2/
Before adding each character to the map
check first if it already exists
and if found increment the value portion of the name value pair
else
if not found insert into map with value of 1.

My problems seems to be I cannot "check the map" if the character exists and
if it does exist how do I get at the value to increment it.

Here is what I have so far,

import java.io.*;
import java.util.*;
class TextTest
{
public static Map map = new HashMap();
private static TreeMap treeMap;
public static void main(String[] args) throws IOException
{

FileInputStream in = new FileInputStream("textfile.txt");
int ch;
int total = 0;
int count = 1;

while ((ch = in.read()) != -1)
{
total ++;
String tempStr = (Integer.toString(ch)); //Only way to save the
"char" in the map was to convert it to a string.
System.out.print((char)ch );

if (map.containsKey(tempStr))
{
map.put(tempStr,"value" ); //How can i extract
the value,increment it and save back to the map
}
else
{
map.put(tempStr, "value"); //I need to save the
integer 1 here in the value part of the map
}
}
treeMap = new TreeMap(map); //sort the map
System.out.println("Total =" + total);
System.out.print(treeMap);
}
}




 
Reply With Quote
 
 
 
 
Stefan Ram
Guest
Posts: n/a
 
      08-04-2007
"Jay Cee" <(E-Mail Removed)> writes:
>My problems seems to be I cannot "check the map" if the
>character exists and if it does exist how do I get at the value
>to increment it.


You might use something like

class NumericMapUtils
{ public static <D> void addTo /* autovivificate the value to 0 */
( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
{ map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

and a sorted map

java.util.TreeMap<java.lang.Character,java.lang.In teger> map;

then add each text like

NumericMapUtils.addTo<java.lang.String>( map, 'a', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'c', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'n', 1 );
NumericMapUtils.addTo<java.lang.String>( map, 'x', 1 );

(I have not tested this.)

Then iterate: »for( final java.lang.Character key: map.keySet() )«

For files of arbitrary size, use java.math.BigInteger instead
of java.lang.Integer.

 
Reply With Quote
 
 
 
 
Stefan Ram
Guest
Posts: n/a
 
      08-04-2007
Supersedes: <(E-Mail Removed)-berlin.de>

"Jay Cee" <(E-Mail Removed)> writes:
>My problems seems to be I cannot "check the map" if the
>character exists and if it does exist how do I get at the value
>to increment it.


You might use something like

class NumericMapUtils
{ public static <D> void addTo /* autovivificate the value to 0 */
( final java.util.Map<D,java.lang.Integer> map, final D d, final int i )
{ map.put( d, i +( map.containsKey( d )? map.get( d ): 0 )); }}

and a sorted map

java.util.TreeMap<java.lang.Character,java.lang.In teger> map;

then add each text like

NumericMapUtils.addTo<java.lang.Character>( map, 'a', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'c', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'n', 1 );
NumericMapUtils.addTo<java.lang.Character>( map, 'x', 1 );

(I have not tested this. Possibly, the "<java.lang.Character>"
type argument can be omitted.)

Then iterate: »for( final java.lang.Character key: map.keySet() )«

For files of arbitrary size, use java.math.BigInteger instead
of java.lang.Integer.

Supersedes: <(E-Mail Removed)-berlin.de>


 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      08-04-2007
Jay Cee wrote:
> Hi All,
> Relatively new to java (ex VB) and could do with some help.
> I need to read a text file character by character (can do),
> and count each character as it appears, i.e
> "A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
> the results.
>
> I have a few issues which I cannot seem to solve easily,
> 1/
> I thought it would be a good idea to save the characters in a hashmap in
> name-value pairs as they are read , map.put(tempStr,"1" )
> I found I had to convert the character to a string before it would save to
> the map.Ideally I would like to save as a character.


Maps (all Collections, in fact) deal only with objects,
so you cannot store primitive values like char in them. But
you can use a Character object, which expresses your intent
more directly than a String does.

Similarly, the mapped values must also be objects. I
think an Integer would be a better choice than a String; if
you expect counts greater than two billion use a Long.

> 2/
> Before adding each character to the map
> check first if it already exists
> and if found increment the value portion of the name value pair
> else
> if not found insert into map with value of 1.
>
> My problems seems to be I cannot "check the map" if the character exists and
> if it does exist how do I get at the value to increment it.


The map has a containsKey() method that tells you whether
there is or isn't an entry for a key you're interested in.

If you're using an Integer (or Long) as the counter, you
can't just increment it: like String, an Integer cannot be
changed once it's created. Instead, you need to retrieve the
existing Integer from the map and replace it with a larger one.

... and since you need to retrieve the Integer anyhow, the
containsKey() method doesn't seem worth while: Just ask the map
for the Integer corresponding to such-and-such a Character. If
there is one, replace it. If there's not, you'll get a null
back from the map and this can be your signal to start a new
counter at unity:

Character key = Character.valueOf( (char)ch );
Integer val = (Integer)map.get(key);
if (val == null)
val = Integer.valueOf(1);
else
val = Integer.valueOf(val.intValue() + 1);
map.put(key, val);

Another approach would be to invent your own Counter class
that looks a lot like an Integer but is mutable: it has methods
like set() or increment() that change its value. Then the code
might look like

Character key = Character.valueOf( (char)ch );
Counter cnt = (Counter)map.get(key);
if (cnt == null)
map.put(key, new Counter()); // initial value zero
cnt.increment();

> Here is what I have so far,
>
> import java.io.*;
> import java.util.*;
> class TextTest
> {
> public static Map map = new HashMap();
> private static TreeMap treeMap;
> public static void main(String[] args) throws IOException
> {
>
> FileInputStream in = new FileInputStream("textfile.txt");


A word of warning: This is legal, but may not be what you
intend. InputStreams are for files made of bytes; Readers are
for files made of characters. If an InputStream encounters a
character that has been encoded in several bytes, it will deliver
those bytes to you individually. If a Reader encounters such a
thing, it will decode the multi-byte sequence and deliver you
the single corresponding character.

By the way, this sort of code is fine if your objective is
to learn about Maps and the like. But if your goal is really
to count char values (or byte values), an array of 65536 (or
256) ints or longs will be easier:

counts[ch]++;

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
Patricia Shanahan
Guest
Posts: n/a
 
      08-04-2007
Jay Cee wrote:
> Hi All,
> Relatively new to java (ex VB) and could do with some help.
> I need to read a text file character by character (can do),
> and count each character as it appears, i.e
> "A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
> the results.
>
> I have a few issues which I cannot seem to solve easily,
> 1/
> I thought it would be a good idea to save the characters in a hashmap in
> name-value pairs as they are read , map.put(tempStr,"1" )
> I found I had to convert the character to a string before it would save to
> the map.Ideally I would like to save as a character.

....

Although it can certainly be done with a map, I might not use one for
this. There are only 65,536 possible values for a Java char, so why not
an array?

char[] counts = new char[Character.MAX_VALUE+1];
....
counts[ch]++;
....

Patricia
 
Reply With Quote
 
Jay Cee
Guest
Posts: n/a
 
      08-04-2007
Hi Patricia
Yours seems the simplest way to go forward with this but do I have to
iterate through the array each time I read in a character ? This will
probably be ok for this instance but if I wanted to do a character count on
a large document(a book?) surely this would be slower than a hashmap. Is
there an array that can hold [char,integer] , I will have to do some
more research.

Eric thank you for the explanation and the "A word of warning". I have been
getting this issue of more than 1 char read in and I was wondering why , I
wonder no longer

Stefan thank you for the swift reply , I will have to do some reading on the
NumericMapUtils and autovivificate which is not a word I have come across in
my life until today!!


Jay


"Patricia Shanahan" <(E-Mail Removed)> wrote in message
news:f92rr1$2r7l$(E-Mail Removed)...
> Jay Cee wrote:
>> Hi All,
>> Relatively new to java (ex VB) and could do with some help.
>> I need to read a text file character by character (can do),
>> and count each character as it appears, i.e
>> "A small sample text file" would have 1-A , 2-s, 2-m ,etc etc. and output
>> the results.
>>
>> I have a few issues which I cannot seem to solve easily,
>> 1/
>> I thought it would be a good idea to save the characters in a hashmap in
>> name-value pairs as they are read , map.put(tempStr,"1" )
>> I found I had to convert the character to a string before it would save
>> to the map.Ideally I would like to save as a character.

> ...
>
> Although it can certainly be done with a map, I might not use one for
> this. There are only 65,536 possible values for a Java char, so why not
> an array?
>
> char[] counts = new char[Character.MAX_VALUE+1];
> ...
> counts[ch]++;
> ...
>
> Patricia



 
Reply With Quote
 
Patricia Shanahan
Guest
Posts: n/a
 
      08-04-2007
Jay Cee wrote:
> Hi Patricia
> Yours seems the simplest way to go forward with this but do I have to
> iterate through the array each time I read in a character ? This will
> probably be ok for this instance but if I wanted to do a character count on
> a large document(a book?) surely this would be slower than a hashmap. Is
> there an array that can hold [char,integer] , I will have to do some
> more research.


Sorry, I made a mistake making it a char[], which confuses matters.

You want an array type that is big enough for each element to hold the
maximum number of instances of any one character you expect to see in
the input. Since you are using an int for the total, int must be good
enough:

int[] counts = new int[Character.MAX_VALUE+1];

Each character has its very own entry. For example, decimal 65
corresponds to 'A', so if you see an 'A' in the input, counts[65] would
increment by one. Use element 65 for 'A' regardless of what has happened
before.

The only time you need to iterate through the array is at the end, to
report the non-zero counts.

for(int i = 0; i<counts.length; i++){
if(counts[i] > 0){
char ch = (char)i;
System.out.println("character "+ch+" count "+counts[i]);
}
}

Patricia
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-05-2007
On Sat, 4 Aug 2007 22:09:10 +0100, "Jay Cee"
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>I thought it would be a good idea to save the characters in a hashmap in
>name-value pairs as they are read , map.put(tempStr,"1" )


You would use HashMap<String,Integer> You have to keep creating new
Integer objects, one bigger. It is rather clumsy and slow, though
probably quite adequate to the task.

Chances are your file contains some limited set of chars, likely only
chars 0..255. So instead you could use a int[256] to store the
counts. You index by character. You simply use the ++ operator. It
is quite a bit simpler. In the worst case you need an array [65535] if
you have no control over the chars.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      08-05-2007
>Yours seems the simplest way to go forward with this but do I have to
>iterate through the array each time I read in a character ?

You index. Most people don't know you can index by chars
e.g. int x = count[ 'A' ]; is legit java. The char gets promoted to
the corresponding Unicode int.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
cyprian
Guest
Posts: n/a
 
      08-22-2007
On Aug 4, 11:51 pm, Roedy Green <(E-Mail Removed)>
wrote:
> >Yours seems the simplest way to go forward with this but do I have to
> >iterate through the array each time I read in a character ?

>
> You index. Most people don't know you can index by chars
> e.g. int x = count[ 'A' ]; is legit java. The char gets promoted to
> the corresponding Unicode int.
> --
> Roedy Green Canadian Mind Products
> The Java Glossaryhttp://mindprod.com


to do a character count on a text file, try reading it in through a
stream, buffer the stream and do read() on the buffered stream. It
just returns the number of characters read, unicode, code point
insensitive.then try doing your map thing on it. I was counting some
words myself recently. http://genericjava.blogspot.com/2007...ays-let-me.htm,
on the other hand you could do readLine() on the buffered stream and
insert the result into a string buffer and play with the string buffer
directly. Try doing a regexp construct if possible. Use the string
buffer as framework for mapping characters to your map and counting
them char by char and making the count the value for each character
key.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Compressing a text file using count of continous characters nirvana Python 4 12-15-2007 09:36 AM
Count total no. of characters,words & sentences in a text file Umesh C++ 17 04-26-2007 08:47 AM
Count total no. of characters,words & sentences in a text file Umesh C Programming 25 04-26-2007 08:47 AM
In file parsing, taking the first few characters of a text file after a readfile or streamreader file read... .Net Sports ASP .Net 11 01-17-2006 12:44 AM
Need to concatenate all files in a dir together into one file and read the first 225 characters from each file into another file. Tony Perl Misc 5 04-19-2004 03:28 PM



Advertisments