![]() |
search and replace
[total novice here]
Hi, I have a series of expressions like this (shortened from verbose xml) ===================== [<text:sequence text:ref-name="refAutoNr0">1</text:sequence> [<text:sequence text:ref-name="refAutoNr1">2</text:sequence> [<text:sequence text:ref-name="refAutoNr2">3</text:sequence> [<text:sequence text:ref-name="refAutoNr3">4</text:sequence> ===================== I want to globally replace each such line with just ==================== \head ==================== followed by a line space so I get ==================== \head \head \head \head ==================== etc. I am modifying a script with lines like ==================== data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do '\starttext' + "\n" + $2 + "\n" + '\stoptext' ==================== and don't yet know enough to completely understand. Probably a few more hours/days of study will get me there but I need this urgently so... THNX in advance Best Idris |
Re: search and replace
--------------enigAF59EE22A0069F24937D8CA9
Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable ishamid wrote: > [total novice here] >=20 > Hi, >=20 > I have a series of expressions like this (shortened from verbose xml) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D=3D > [<text:sequence text:ref-name=3D"refAutoNr0">1</text:sequence> > [<text:sequence text:ref-name=3D"refAutoNr1">2</text:sequence> > [<text:sequence text:ref-name=3D"refAutoNr2">3</text:sequence> > [<text:sequence text:ref-name=3D"refAutoNr3">4</text:sequence> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D=3D >=20 > I want to globally replace each such line with just >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D > \head > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D >=20 > followed by a line space so I get >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D > \head >=20 > \head >=20 > \head >=20 > \head > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D >=20 > etc. >=20 > I am modifying a script with lines like >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D > data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do > '\starttext' + "\n" + $2 + "\n" + '\stoptext' > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3 D=3D=3D=3D >=20 > and don't yet know enough to completely understand. Probably a few more= > hours/days of study will get me there but I need this urgently so... >=20 > THNX in advance >=20 Urght. *ducks* > Best > Idris >=20 >=20 Regexps and XML always tend to blow up for me. The pattern you're searching for seems to be a complete element, why not use <insert XML parser of choice> and XPath? With REXML, it should be something like: document.elements.each('//text:sequence') {|sequence| sequence.replace_with(REXML::Text.new("\\head\n", true))} Substitute the XPath expression with one of desired precision. I'm a little unsure around how REXML treats namespaces in XPath and such, but if you know what prefix will be used in the document, that should work ou= t. The script might also require a little more massaging if you're outputting to plaintext, but treating XML like, well, XML might get the heavy lifting of searching for patterns in it done faster if you use a pattern language operating on the DOM structure directly. David Vallner --------------enigAF59EE22A0069F24937D8CA9 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (MingW32) iD8DBQFFccLUy6MhrS8astoRAslhAJ45yqsdcH9HGm/SvN2MtjAV75M1VgCeN0Tx UXiQ39eCXVG32nDKEQZjODI= =vyHu -----END PGP SIGNATURE----- --------------enigAF59EE22A0069F24937D8CA9-- |
Re: search and replace
Hi Paul,
On Dec 2, 10:56 am, Paul Lutus wrote: If you will post a short, complete data example, even just one record as it > appears in your database, so we don't have to try to read between the > lines, someone here will be happy to produce a way to filter the data in > the way you want. Ok, here are 4 bibliography entries. I just did a follow-up posting with more detail (including the full script I'm trying to modify) so you may prefer to respond to that one. Thank you very much for your help!. ====================== <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr0" text:name="AutoNr" text:formula="ooow:AutoNr+1" style:num-format="1">1</text:sequence></text:p> <text:p text:style-name="P6">'Abd al-Râziq, Ahmad</text:p> <text:p text:style-name="reference"><text:span text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span text:style-name="T4">., in, </text:span><text:span text:style-name="T3">"Schätze der Kalifen: Islamische Kunst zur Fatimidenzeit."</text:span><text:span text:style-name="T4">, Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien; Milan: Skira, 1998, pp. 144-147</text:span></text:p> <text:p text:style-name="P7"/> <text:p text:style-name="P7"/> <text:p text:style-name="ID"><text:span text:style-name="T5">[</text:span><text:sequence text:ref-name="refAutoNr1" text:name="AutoNr" text:formula="ooow:AutoNr+1" style:num-format="1">2</text:sequence></text:p> <text:p text:style-name="P8">'Abd al-Râziq, Ahmad</text:p> <text:p text:style-name="reference"><text:span text:style-name="T6">La mosquée al-Azhar</text:span><text:span text:style-name="T7">., in, </text:span><text:span text:style-name="T6">"Trésors fatimides du Caire. Exposition présentée à l'Institut du Monde Arabe ... </text:span><text:span text:style-name="T8">1998."</text:span><text:span text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp. 147-149</text:span></text:p> <text:p text:style-name="P7"/> <text:p text:style-name="P7"/> <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr2" text:name="AutoNr" text:formula="ooow:AutoNr+1" style:num-format="1">3</text:sequence></text:p> <text:p text:style-name="Standard"><text:s/>'Amri, Husay 'Abdallah</text:p> <text:p text:style-name="reference"><text:span text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar al-Maqbali (d. 1108/1728) Concerning the Legal Position of the Batiniyyah (Isma'iliyyah) of the People of Hamdan</text:span>., Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p> <text:p text:style-name="reference"/> <text:p text:style-name="reference"/> <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr3" text:name="AutoNr" text:formula="ooow:AutoNr+1" style:num-format="1">4</text:sequence></text:p> <text:p text:style-name="Standard">Abarahamov, Binyamin</text:p> <text:p text:style-name="reference"><text:span text:style-name="T10">An Isma'ili Epistemology: The Case of Al-Da'i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>., <text:span text:style-name="Style2">Journal of Semitic Studies</text:span>, 41ii (1996), pp. 263-273.</text:p> <text:p text:style-name="reference"/> <text:p text:style-name="reference"/> ====================== |
Re: search and replace
Thank you, David, for your pointers. I'm still very much a novice (at
the level of Chris Pine's Learn to Program) so I could not follow them all, but I do hope to learn more fast. I just sent a follow-up with more detail, including the script I'm trying to modify; I hope you have a chance to look at it... Thank you again Idris On Dec 2, 11:15 am, David Vallner <d...@vallner.net> wrote: > Regexps and XML always tend to blow up for me. The pattern you're > searching for seems to be a complete element, why not use <insert XML > parser of choice> and XPath? > > With REXML, it should be something like: > > document.elements.each('//text:sequence') {|sequence| > sequence.replace_with(REXML::Text.new("\\head\n", true))} > > Substitute the XPath expression with one of desired precision. I'm a > little unsure around how REXML treats namespaces in XPath and such, but > if you know what prefix will be used in the document, that should work out. > > The script might also require a little more massaging if you're > outputting to plaintext, but treating XML like, well, XML might get the > heavy lifting of searching for patterns in it done faster if you use a > pattern language operating on the DOM structure directly. |
Re: search and replace
ishamid wrote:
> Hi Paul, > > On Dec 2, 10:56 am, Paul Lutus wrote: > > If you will post a short, complete data example, even just one record > as it > > appears in your database, so we don't have to try to read between the > > lines, someone here will be happy to produce a way to filter the data in > > the way you want. > > Ok, here are 4 bibliography entries. I just did a follow-up posting > with more detail (including the full script I'm trying to modify) so > you may prefer to respond to that one. Thank you very much for your > help!. > > ====================== > <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr0" > text:name="AutoNr" text:formula="ooow:AutoNr+1" > style:num-format="1">1</text:sequence></text:p> > <text:p text:style-name="P6">'Abd al-Râziq, Ahmad</text:p> > <text:p text:style-name="reference"><text:span > text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span > text:style-name="T4">., in, </text:span><text:span > text:style-name="T3">"Schätze der Kalifen: Islamische Kunst zur > Fatimidenzeit."</text:span><text:span text:style-name="T4">, > Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien; > Milan: Skira, 1998, pp. 144-147</text:span></text:p> > <text:p text:style-name="P7"/> > <text:p text:style-name="P7"/> > <text:p text:style-name="ID"><text:span > text:style-name="T5">[</text:span><text:sequence > text:ref-name="refAutoNr1" text:name="AutoNr" > text:formula="ooow:AutoNr+1" > style:num-format="1">2</text:sequence></text:p> > <text:p text:style-name="P8">'Abd al-Râziq, Ahmad</text:p> > <text:p text:style-name="reference"><text:span > text:style-name="T6">La mosquée al-Azhar</text:span><text:span > text:style-name="T7">., in, </text:span><text:span > text:style-name="T6">"Trésors fatimides du Caire. Exposition > présentée à l'Institut du Monde Arabe ... > </text:span><text:span > text:style-name="T8">1998."</text:span><text:span > text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp. > 147-149</text:span></text:p> > <text:p text:style-name="P7"/> > <text:p text:style-name="P7"/> > <text:p text:style-name="ID">[<text:sequence > text:ref-name="refAutoNr2" text:name="AutoNr" > text:formula="ooow:AutoNr+1" > style:num-format="1">3</text:sequence></text:p> > <text:p text:style-name="Standard"><text:s/>'Amri, Husay > 'Abdallah</text:p> > <text:p text:style-name="reference"><text:span > text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar > al-Maqbali (d. 1108/1728) Concerning the Legal Position of the > Batiniyyah (Isma'iliyyah) of the People of Hamdan</text:span>., > Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New > Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p> > <text:p text:style-name="reference"/> > <text:p text:style-name="reference"/> > <text:p text:style-name="ID">[<text:sequence > text:ref-name="refAutoNr3" text:name="AutoNr" > text:formula="ooow:AutoNr+1" > style:num-format="1">4</text:sequence></text:p> > <text:p text:style-name="Standard">Abarahamov, Binyamin</text:p> > <text:p text:style-name="reference"><text:span > text:style-name="T10">An Isma'ili Epistemology: The Case of > Al-Da'i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>., > <text:span text:style-name="Style2">Journal of Semitic > Studies</text:span>, 41ii (1996), pp. 263-273.</text:p> > <text:p text:style-name="reference"/> > <text:p text:style-name="reference"/> > > ====================== puts DATA.read.gsub( %r{<(text:sequence)\s[^>]*>(.*?)</\1>}i, "\\starttext\n\\2\n\\stoptext" ) --- output ----- <text:p text:style-name="ID">[\starttext 1 \stoptext</text:p> <text:p text:style-name="P6">'Abd al-R\xC3\xA2ziq, Ahmad</text:p> <text:p text:style-name="reference"><text:span text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span text:style-name="T4">., in, </text:span><text:span text:style-name="T3">"Sch\xC3\xA4tze der Kalifen: Islamische Kunst zur Fatimidenzeit."</text:span><text:span text:style-name="T4">, Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien; Milan: Skira, 1998, pp. 144-147</text:span></text:p> <text:p text:style-name="P7"/> <text:p text:style-name="P7"/> <text:p text:style-name="ID"><text:span text:style-name="T5">[</text:span>\starttext 2 \stoptext</text:p> <text:p text:style-name="P8">'Abd al-R\xC3\xA2ziq, Ahmad</text:p> <text:p text:style-name="reference"><text:span text:style-name="T6">La mosqu\xC3(C)e al-Azhar</text:span><text:span text:style-name="T7">., in, </text:span><text:span text:style-name="T6">"Tr\xC3(C)sors fatimides du Caire. Exposition pr\xC3(C)sent\xC3(C)e \xC3 l'Institut du Monde Arabe ... </text:span><text:span text:style-name="T8">1998."</text:span><text:span text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp. 147-149</text:span></text:p> <text:p text:style-name="P7"/> <text:p text:style-name="P7"/> <text:p text:style-name="ID">[\starttext 3 \stoptext</text:p> <text:p text:style-name="Standard"><text:s/>'Amri, Husay 'Abdallah</text:p> <text:p text:style-name="reference"><text:span text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar al-Maqbali (d. 1108/1728) Concerning the Legal Position of the Batiniyyah (Isma'iliyyah) of the People of Hamdan</text:span>., Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p> <text:p text:style-name="reference"/> <text:p text:style-name="reference"/> <text:p text:style-name="ID">[\starttext 4 \stoptext</text:p> <text:p text:style-name="Standard">Abarahamov, Binyamin</text:p> <text:p text:style-name="reference"><text:span text:style-name="T10">An Isma'ili Epistemology: The Case of Al-Da'i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>., <text:span text:style-name="Style2">Journal of Semitic Studies</text:span>, 41ii (1996), pp. 263-273.</text:p> <text:p text:style-name="reference"/> <text:p text:style-name="reference"/> |
| All times are GMT. The time now is 06:21 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.