Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Best way to tokenize in String

Reply
Thread Tools

Best way to tokenize in String

 
 
sravanreddy001
Guest
Posts: n/a
 
      09-10-2011
Hi,
what is the efficient way to tokenize the string and splitting into
words based on the delimiter?

i've looked at the strtok() in string.h
is there any other better way to do this?
A more efficient one?
 
Reply With Quote
 
 
 
 
Ian Collins
Guest
Posts: n/a
 
      09-10-2011
On 09/10/11 02:25 PM, sravanreddy001 wrote:
> Hi,
> what is the efficient way to tokenize the string and splitting into
> words based on the delimiter?


The simplest way is to create an istringstream from the string and
stream out the words.

--
Ian Collins
 
Reply With Quote
 
 
 
 
Paul
Guest
Posts: n/a
 
      09-10-2011
On Sep 10, 4:14*am, Ian Collins <(E-Mail Removed)> wrote:
> On 09/10/11 02:25 PM, sravanreddy001 wrote:
>
> > Hi,
> > what is the efficient way to tokenize the string and splitting into
> > words based on the delimiter?

>
> The simplest way is to create an istringstream from the string and
> stream out the words.
>

But is that the most efficient way?

However simplicty is sometime preferred over efficiency and the stream
method is quite handy, it is described here, bottom of page:
http://www.oopweb.com/CPP/Documents/...g-HOWTO-7.html

Is there a way to use this stream method when the delimiter is not
whitespace?


 
Reply With Quote
 
red floyd
Guest
Posts: n/a
 
      09-10-2011
On 9/9/2011 10:30 PM, Paul wrote:
> On Sep 10, 4:14 am, Ian Collins<(E-Mail Removed)> wrote:
>> On 09/10/11 02:25 PM, sravanreddy001 wrote:
>>
>>> Hi,
>>> what is the efficient way to tokenize the string and splitting into
>>> words based on the delimiter?

>>
>> The simplest way is to create an istringstream from the string and
>> stream out the words.
>>

> But is that the most efficient way?


Screw "efficiency". The amount of time parsing should be minimal
compared to either your I/O or processing time. Until you benchmark
and show that the tokenizing is a bottleneck, go for the simplicity.


 
Reply With Quote
 
Ben Cottrell
Guest
Posts: n/a
 
      09-10-2011
Paul wrote:
> On Sep 10, 4:14 am, Ian Collins <(E-Mail Removed)> wrote:
>
>>On 09/10/11 02:25 PM, sravanreddy001 wrote:
>>
>>
>>>Hi,
>>>what is the efficient way to tokenize the string and splitting into
>>>words based on the delimiter?

>>
>>The simplest way is to create an istringstream from the string and
>>stream out the words.
>>

>
> But is that the most efficient way?
>
> However simplicty is sometime preferred over efficiency and the stream
> method is quite handy, it is described here, bottom of page:
> http://www.oopweb.com/CPP/Documents/...g-HOWTO-7.html
>
> Is there a way to use this stream method when the delimiter is not
> whitespace?
>
>

Yes, there's a technique called the "whitespace redefinition approach"
which will let you do that:

https://groups.google.com/group/comp...in&hl=de&pli=1


Personally I like the ability to use 'cin >> foo >> bar' style to read
delimited data from a stream, although most C++ programmers I know don't
really know/care much about locales, ctypes and facets, and
unfortunately I would expect they'd probably consider it to be "a bit
too weird" to use in their code.




Another one which I like, using TR1/Boost RegEx:

#include <string>
#include <regex>
#include <iostream>

int main()
{
std::string str = "the\t quick brown\n-\n- fox"
" jumped..over,the,lazy,.dog";
std::tr1::regex re("[\\s-,.]+");

std::tr1::sregex_token_iterator
iter(str.begin(), str.end(), re, -1),
end;

while(iter != end)
{
std::cout << *iter++ << std::endl;
}
}

 
Reply With Quote
 
Asger-P
Guest
Posts: n/a
 
      09-10-2011

Hi sravanreddy001

On the: 10. of september-2011 At: 04:25 sravanreddy001 wrote:

> Hi,
> what is the efficient way to tokenize the string and splitting into
> words based on the delimiter?


You write delimiter and not delimiters, is that because You
have only one delimiter to consider ?

If thats the case of if You have only a few known delimiters
then You can do a it a lot faster then strtok, if You write
Your own tokenizer.


Best regards
Asger-P
 
Reply With Quote
 
Paul
Guest
Posts: n/a
 
      09-10-2011
On Sep 10, 1:05*pm, Ben Cottrell <(E-Mail Removed)> wrote:
> Paul wrote:
> > On Sep 10, 4:14 am, Ian Collins <(E-Mail Removed)> wrote:

>
> >>On 09/10/11 02:25 PM, sravanreddy001 wrote:

>
> >>>Hi,
> >>>what is the efficient way to tokenize the string and splitting into
> >>>words based on the delimiter?

>
> >>The simplest way is to create an istringstream from the string and
> >>stream out the words.

>
> > But is that the most efficient way?

>
> > However simplicty is sometime preferred over efficiency and the stream
> > method is quite handy, it is described here, bottom of page:
> >http://www.oopweb.com/CPP/Documents/...Programming-HO...

>
> > Is there a way to use this stream method when the delimiter is not
> > whitespace?

>
> Yes, there's a technique called the "whitespace redefinition approach"
> which will let you do that:
>
> https://groups.google.com/group/comp...e2e4eb8e0ba?ou...
>
> Personally I like the ability to use 'cin >> foo >> bar' style to read
> delimited data from a stream, although most C++ programmers I know don't
> really know/care much about locales, ctypes and facets, and
> unfortunately I would expect they'd probably consider it to be "a bit
> too weird" to use in their code.

This looks preety usefull especially if your source comes from a file.
I have to admit I don't know much about locales and facets as I've
never really delved into that part of the language. It looks quite
advanced and think it might take a couple of days studying to get a
good understanding of it all. I'll add it my list of things to do.

>
> Another one which I like, using TR1/Boost RegEx:
>
> #include <string>
> #include <regex>
> #include <iostream>
>
> int main()
> {
> * * *std::string str = "the\t * *quick *brown\n-\n- fox"
> * * * * * * * * * * * *" jumped..over,the,lazy,.dog";
> * * *std::tr1::regex re("[\\s-,.]+");
>
> * * *std::tr1::sregex_token_iterator
> * * * * *iter(str.begin(), str.end(), re, -1),
> * * * * *end;
>
> * * *while(iter != end)
> * * *{
> * * * * *std::cout << *iter++ << std::endl;
> * * *}
>
>
>

This looks very tidy too. More usefull if input is in the form of a
string. Again never really delved into boost RegEx, another one for
the list
 
Reply With Quote
 
Moshbear dot Net
Guest
Posts: n/a
 
      09-11-2011
On Sep 9, 10:25*pm, sravanreddy001 <(E-Mail Removed)> wrote:
> Hi,
> what is the efficient way to tokenize the string and splitting into
> words based on the delimiter?
>
> i've looked at the strtok() in string.h
> is there any other better way to do this?
> A more efficient one?


I've had good results with boost.tokenizer.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cheap way to tokenize variable array? Travis C++ 2 07-15-2008 09:48 AM
string tokenize... Sree Java 1 03-08-2007 04:17 PM
How to tokenize string without using strtok bubunia2000@yahoo.co.in C Programming 20 02-18-2006 01:02 AM
tokenize a string Kelvin@!!! C++ 4 02-25-2005 02:59 AM
Trying to tokenize a string Lans C++ 9 07-10-2003 12:59 PM



Advertisments