Flat Earth Catalogue

2004-04-20

Haiku in English II

My own thought, lately, is this: Write 14 syllables. Place them on one line, and pencil breaks four syllables from each end. Adjust the position of these breaks: If a break falls between words, shift it to either adjacent word boundary; if it falls within a word, sift it first to either boundary of that word, and then optionally one word further. One break needs to be a proper phrase/fragment break, where a major semantic juxtaposition occurs; the other can be prosodic or punctuational in nature.
And I think that allows a good amount of content, maintains juxtaposition (and 'one-breath', if you want it), leaves three lines for the sticklers, and simulates the wonders of ku-matagari. Would be hard to get them out of sequential (IPP-style) collaborations though. It takes a fair bit of revising to get one to fall together.
04:43

Haiku in English

I'm not going to go into the issues here. Everybody has a unique take. All are agreed only that 5-7-5 is nothing like traditional Japanese Haiku. Google this string—haiku one line segment-straddling English—and you'll see what I mean. That being the case, I may try the zip.
In both zip forms, punctuation beyond question and exclamation marks is eschewed. Kigo and shasei are your own business; zip is a prosodic form, not a semantic school.
Zip long stanza: Fifteen syllables in two lines, each containing a triple-spaced caesura which does not pause as strongly as the line break. Print with the caesurae aligned.
Zip short stanza: Eleven syllabes in one line containing two double-spaced caesurae. (Optionally, especially in tanka, one of these may strengthen to a line break.)
04:31

Google, Cyrillic, and Unicode

So I just tried a little experiment with Mashke's online Cyrillic Converter and Google. I delivered the same source text to the converter in every case, but had it output a different encoding. The output I copied and pasted (in IE for Windows 6 atop XP) into Google. Now, there are three explanations for what happened, only one of which is good. The following encodings all displayed in proper cyrillic onscreen, and when pasted into Google gave the same result set: KOI8, DOS866, WIN1251, and MAC. ISO8859-5 was turned into gook by IE, and gave a different result set (which did, however, feature pages with the same gook on them). So: The cool explanation is that Google, while clipping each page, has converted all the idiosyncratic encodings to normalized Unicode, and does the same to queries before searching. But it doesn't know about ISO8859-5. If you think about that—not only does Google have a large portion of the Web in RAM, it has converted that large portion of the Web to Unicode! The less-believable explanation is that IE had the smarts, when I copied that text from its native encoding, to copy it as UTF-8, and that was what was submitted to Google (but IE didn't know about ISO8859-5). Their index is still in Unicode, but they're not transforming all queries. I don't believe it because, one, when has Microsoft been that smart and that concerned about internationalization; and two, parameters in the Google URL include ie=UTF-8&oe=UTF-8, which I'm betting stand for Input Encoding and Output Encoding. (These terms appear even in the search that I gave in ISO8859-5.) Then there's the really-unbelievable explanation, which is that all the URLs in that result set have got clever servers behind them that spit out several encodings, and that these servers for whatever reason all gave Google every encoded version except ISO8859-5. Me, I like the first a lot, though the second is plausible.
04:15