Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > splitting strings with python

Reply
Thread Tools

splitting strings with python

 
 
sbucking@gmail.com
Guest
Posts: n/a
 
      06-09-2005
im trying to split a string with this form (the string is from a
japanese dictionary file with mulitple definitions in english for each
japanese word)


str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /


the varibles i need are str*, def*.

sometimes the (1) and (2) are not included - they are included only if
the word has two different meanings


"..." means that there are sometimes more then two definitions per
meaning.


im trying to use the re.split() function but with no luck.

Is this possible with python, or am i dreamin!?

All the best,

..

 
Reply With Quote
 
 
 
 
inhahe
Guest
Posts: n/a
 
      06-09-2005

<> wrote in message
news: oups.com...
> im trying to split a string with this form (the string is from a
> japanese dictionary file with mulitple definitions in english for each
> japanese word)
>
>
> str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
>
>
> the varibles i need are str*, def*.
>
> sometimes the (1) and (2) are not included - they are included only if
> the word has two different meanings
>
>
> "..." means that there are sometimes more then two definitions per
> meaning.
>
>
> im trying to use the re.split() function but with no luck.
>
> Is this possible with python, or am i dreamin!?
>
> All the best,
>
> .
>


i don't think you can do it with string.split, although i guess you could do
it with re.split, although i think it's easier to use re.findall.

import re
re.findall("[a-zA-Z][ a-zA-Z0-9]*", inputstring)

should work.




 
Reply With Quote
 
 
 
 
sbucking@gmail.com
Guest
Posts: n/a
 
      06-09-2005
one problem is that str1 is unicode (japanese kanji), and str2 is
japanese kana

can i still use re.findall(~)?

thanks for your help!

 
Reply With Quote
 
sbucking@gmail.com
Guest
Posts: n/a
 
      06-09-2005
sorry, i should be more specific about the encoding

it's euc-jp

i googled alittle, and you can still use re.findall with the japanese
kana, but i didnt find anything about kanji.

 
Reply With Quote
 
Kent Johnson
Guest
Posts: n/a
 
      06-09-2005
wrote:
> im trying to split a string with this form (the string is from a
> japanese dictionary file with mulitple definitions in english for each
> japanese word)
>
>
> str1 [str2] / (def1, ...) (1) def2 / def3 / .... (2) def4/ def5 ... /
>
>
> the varibles i need are str*, def*.


Could you post a few examples of real data and what you want to extract from it? The above raises a few questions:
- are str* and def* single words or can they include whitespace, comma, slash, paren...
- not clear what replaces the ... (or if they are literal)

This might be a good job for PyParsing.

Kent
>
> sometimes the (1) and (2) are not included - they are included only if
> the word has two different meanings
>
>
> "..." means that there are sometimes more then two definitions per
> meaning.
>
>
> im trying to use the re.split() function but with no luck.
>
> Is this possible with python, or am i dreamin!?
>
> All the best,
>
> .
>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM
splitting strings in midlets leni Java 3 08-07-2005 04:27 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Dibling C++ 0 07-19-2003 04:41 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? Mark C++ 0 07-19-2003 04:24 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Ericson C++ 0 07-19-2003 04:03 PM



Advertisments