Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Byte Offsets of Tokens, Ngrams and Sentences?

Reply
Thread Tools

Byte Offsets of Tokens, Ngrams and Sentences?

 
 
Muhammad Adeel
Guest
Posts: n/a
 
      08-06-2010
Hi,

Does any one know how to tokenize a string in python that returns the
byte offsets and tokens? Moreover, the sentence splitter that returns
the sentences and byte offsets? Finally n-grams returned with byte
offsets.

Input:
This is a string.

Output:
This 0
is 5
a 8
string. 10


thanks
 
Reply With Quote
 
 
 
 
Gabriel Genellina
Guest
Posts: n/a
 
      08-06-2010
En Fri, 06 Aug 2010 06:07:32 -0300, Muhammad Adeel <(E-Mail Removed)>
escribió:

> Does any one know how to tokenize a string in python that returns the
> byte offsets and tokens? Moreover, the sentence splitter that returns
> the sentences and byte offsets? Finally n-grams returned with byte
> offsets.
>
> Input:
> This is a string.
>
> Output:
> This 0
> is 5
> a 8
> string. 10


Like this?

py> import re
py> s = "This is a string."
py> for g in re.finditer("\S+", s):
.... print g.group(), g.start()
....
This 0
is 5
a 8
string. 10

--
Gabriel Genellina

 
Reply With Quote
 
 
 
 
Muhammad Adeel
Guest
Posts: n/a
 
      08-06-2010
On Aug 6, 10:49*am, "Gabriel Genellina" <(E-Mail Removed)>
wrote:
> En Fri, 06 Aug 2010 06:07:32 -0300, Muhammad Adeel <(E-Mail Removed)> *
> escribió:
>
> > Does any one know how to tokenize a string in python that returns the
> > byte offsets and tokens? Moreover, the sentence splitter that returns
> > the sentences and byte offsets? Finally n-grams returned with byte
> > offsets.

>
> > Input:
> > This is a string.

>
> > Output:
> > This *0
> > is * * *5
> > a * * * 8
> > string. * 10

>
> Like this?
>
> py> import re
> py> s = "This is a string."
> py> for g in re.finditer("\S+", s):
> ... * print g.group(), g.start()
> ...
> This 0
> is 5
> a 8
> string. 10
>
> --
> Gabriel Genellina


Hi,

Thanks. Can you please tell me how to do for n-grams and sentences as
well?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Time.local and offsets Xavier Noria Ruby 8 08-31-2010 09:22 PM
Need help to find byte offsets for regexps in a file Robert Dodier Perl Misc 2 07-09-2006 12:39 AM
Gathering ngrams with the highest probability Minkoo Seo Ruby 2 04-02-2006 09:27 AM
Dynamic DIV and P creation offsets the P Andrew Poulos Javascript 3 12-01-2005 02:37 PM
OpenSP API, Unicode character byte offsets Phillip Farber XML 0 08-20-2003 09:13 PM



Advertisments