% wknight8111 has joined #parrotsketch % wknight8111 has left wknight8111!~chatzilla@66.252.102.38 % wknight8111 has joined #parrotsketch % wknight8111_ has joined #parrotsketch % wknight8111 has left wknight8111!~chatzilla@66.252.102.38 % wknight8111_ is now known as wknight8111 % wknight8111 has left wknight8111!~chatzilla@66.252.102.37 % cognominal has left cognominal!~cognomina@82.67.232.89 % cognominal has joined #parrotsketch % cognominal has left cognominal!~cognomina@82.67.232.89 % cognominal has joined #parrotsketch % pmichaud has joined #parrotsketch I'm not going to make the meeting, I'll backlog like I always do. REPORT * Talked with japhb about ManagedStruct definitions generation. That wasn't in my original plan, but It seems very useful. I'm going to try to fit in in instead of STDCALL and porting to OSX. STDCALL support is a lower priority item that ManagedStruct support. * The Parrot_jit_build_call_func is completely out of date. Its pre PCC. It still used hardcoded registers numbers for argument and object passing. * I've rewritten most of the argument passing portion of the build_call function and now am starting on handling return values from c functions. * The old generated jit called string_to_cstring to convert STRINGS to char *, but never freed the cstrings. So I'm working generating the necessary free calls in the jit instruction stream. EOR Instead of saying pre PCC, its probably better to say pre variable register counts, when certain registers had particular calling convention purposes. % jhorwitz has joined #parrotsketch % barney has joined #parrotsketch % cotto_work has joined #parrotsketch % Auzon has joined #parrotsketch % wknight8111 has joined #parrotsketch % allison has joined #parrotsketch % NotFound has joined #parrotsketch % DietCoke has joined #parrotsketch Hello, folks. hello, coke hello % chromatic has joined #parrotsketch Hola hi morning good afternoon hi óla My report basically consists of fighting with svn merge for some time this week, and learning enough to drop my from-scratch attempt at tcl on PCT. Why don't we go in "hello" order. Allison? - I've got the pdd25 branch down to 3 failing test files, and I've nearly finished debugging another. - Spent time talking to potential Parrot sponsors. - (Spent lots of time on OSCON.) EOR * my $real_job = I_got_a_new_job($me++) * Lots of GC debugging work, some nasty segfaults to track down * Progress slow but steady EOR (steady progress)++ Fixing bugs, applying patches, and working on pdb I have two questions EOR Traveling; not a lot of time. Helped Allison debug one of the remaining pdd25cx problems. Giving Andrew as much help as possible. chromatic++ Working on some things that weren't appropriate to land before the release. Will branch for the strings PDD shortly; going to pull in NotFound for that. EOR % jonathan has joined #parrotsketch mostly PMC-related bugfixes and closed tickets queue 1 question eor I worked mostly on lexical issues this week, trying to understand them and coming up with a way that Parrot can handle them properly I also tracked down the PGE bugs in the pdd25cx branch -- that appears to be a register alligator bug did a little more work on getting HLL to work with PCT -- several of chromatic++'s fixes this past week help there other than that, no formal report this week -- busy with $otherjob stuff and preparing for lots of trips (register alligator)++ (alligator bug)-- Simplified and extended the Pipp grammar. Stared support for $this in Pipp. Better support for quoted strings in Pipp. Fixed bug with constant table. Added some languages to Perl::Critic testing. Released Parrot 0.6.4. Registered for YAPC::EU .eor davidfetter? If not, particle. particle * davidfetter slacker ~ tewk pasted his report earlier, see the logs. ~ meetings to discuss smoking parrot with ms osl keep getting canceled. hope third time is a charm (post-oscon) ~ parrot foundation setup continues (banking issues, website, donation software, charitable org listing, etc) ~ talking to some potential parrot foundation donors tonight ~ need to finalize plans for oscon travel, and quickly! .end Anyone else? Mee! This week... * Nearly forgot Parrot Sketch! :-O * Spent my time on my Rakudo day on Thursday mostly on implementing Perl 6 enums; the main bits are done now. * Figured out along the way that anonymous classes should be relatively easy-ish * Tried to contribute to the lexicals discussion a bit, though was struggling for brain cycles * Other random odds and ends too; think I fixed a segfault due to an off-by-one...for some reason my brain is all hazy at the moment. * Will do Rakudo day on Friday this week, since it fits best EOR * jonathan needs to be afk for a bit now - sorry heh. Anyone else? ok. I think there were 2 folks with questions. NotFound? First question is simple: new name for pdb. Last proposal is pbc_debug +1 bytecode_debug Whose bytecode? pbc_debug +1 parrot_bytecode_debug pbc is cryptic parrot_bdb :-) parrot_debug pbcdb parrot_debug++ parrot_debugger++ or parrot_debug++ also: pdump -> pbc_dump ?? in theory, it can also debug pasm or pir, so parrot_ makes sense (I thinked it was simple) I like parrot_debugger parrot_debugger +1 parrot_debugger +1 parrot_debugger +2 parrot_debugger wins! in that case, please update the other executable we just renamed to pbc_foo to parrot_foo. =-) barney: pdump -> parrot_dump? depends on the other executable. If it's only for bytecode files, then it should probably remain 'pbc' pmichaud: the debugger is only for pbc. =-) Second questions is about literal strings in pir. The spec says they can only contains ascii chars, but there are test with iso-8859-1 and utf8. or parrot_ * barney agrees with pmichaud parrot_bytecode_, I meant to type NotFound: are those prefixed with unicode:'' or something similar? NotFound: that depends on the string metadata parrot_bytecode_ ++ And also, is not clear if the charset and encoding prefix are intended for the string generated or the contains of the literal. the literal strings in PIR are always specified using ASCII what they encode may be iso8859-1 or utf8 unicode:"\xbb" legal pmichaud: that's the way I understand the spec, but current test fails to meet it. "«" not legal NotFound: which test? Forgot my notes, one second... pmichaud: not according to the PDD. "docs/pdds/draft/pdd19_pir.pod" 1295 lines --14%-- 194,16 13% The PDD is more advanced than the current implementation I'm speaking only of PDD, myself so there are really two levels of answer here: what does it do now, and what should it do? ok. the PPD shows a non-ascii aexample. DietCoke: you're correct, I had not seen pdd19_pir.pod:194 % wknight8111_ has joined #parrotsketch (and we may be dealing with an inconsistently updated spec, here) % wknight8111 has left wknight8111!~chatzilla@66.252.102.38 % wknight8111_ is now known as wknight8111 t/op/stringu/t t/op/stringu.t NotFound: where does the spec say that literal strings can only be in ASCII? line 129 "Only 7-bit ASCII is accepted in string constants; to use characters outside that range, specify an encoding in the way below." ... and then it goes on to show you below how to use something else. =-) NotFound: which test in t/op/stringu.t contains a non-ascii char? I'd add some verbiage like "unless you specify otherwise, as described below" line 129 of PDD 28? pdd 19 line 129 of pdd19 oh, that's easy, PDD 19 is wrong DietCoke: the description below is how to escape it, not how to write it directly. it's still in draft, you can't take it as authoritative yet NotFound: encoding != escaping. NotFound: which test in t/op/stringu.t contains a non-ascii char inside the "..." ? pmichaud: test named "UTF8 as malformed ascii" well, yes, that's testing that it's in fact an error line 197 of pdd19 ? and "UTF8 literals" "UTF8 literals" also UTF8 literals I agree with, although that one matches the example given on pdd19:194 pmichaud: it's testing that it's an error when you specify ASCII encoding, but allowed as a string literal t/op/stringu.t:187ff so, what's the question again? (I think the answer is simply that pdd19 needs clarification.) remember, PDD 19 was pulled together from a pile of old documentation So whay is the intention of the spec? They must always be escaped or not? what no. :200ff disagrees: 'no escapes' (disagrees with pdd) if an encoding is given, no escaping. Tests and spec seem to be in line. Question is whether the spec is correct. if no encoding is given, then the contents of the "..." must be 7-bit ascii hang on... editing the text now note that unicode: is not an encoding, so unicode:"«" is not valid, although utf8:unicode:"«" is. Folks, I have to run. I do hope we can agree to name our executables in some of consistent fashion. cotto_work still has a question when this question is resolved. See folks next week. % DietCoke has left DietCoke!coke@feather.perl6.nl bye And the second part is: when unicode: and no encoding is specified, the default utf8 is appliable to the generated string only, or to the escapes in the content also? edited result "The default encoding for a double-quoted string constant is 7-bit ASCII, other character sets and encodings must be marked explicitly using a charset or encoding flag." I think we need to also make it clear that 7-bit ascii is required when the encoding is not well, it's not exactly "required" I think 7-bit ascii is redundant. "Ascii extended" is not ascii. allison: how will the compiler know how to process the bytes in the "..." if the encoding isn't known? it throws an exception that was what the second test was checking argggh. isn't "throw an exception" equivalent to "didn't meet the requirement?" but, you can enter escaped characters ...because an encoding wasn't given? if you enter characters that aren't ascii, it'll treat them as ascii if an encoding isn't given, it's the same as if you specified an encoding of "ascii:" exactly the same ascii or fixed8? ascii (at least, that's what it was) so, we treat "ascii" as specifying both an encoding and a charset? I think that not allowing any non ascii char will be a cleaner way. Compiler can explcitly say what they intend when generating pir. ascii:"\xab" throws an exception? or is it "backslash, x, a, b" ? throws an exception '\xab' (single quotes) is backslash, x, a, b ascii:"\x0d" is a newline? pmichaud: no, is cr sorry, cr pmichaud: should be utf8:unicode:"\x0d" is a newline? ucs2:"\x0d" is a newline? (or do we decide not to support ucs2?) it's only characters outside the ASCII range that throw an exception when the string is ASCII s/newline/cr/ By ASCII, you mean 7 bits? my point is that for some encodings we can't always decide if a backslash is an escape or part of the character being encoded NotFound: (I'm leaving the 7-bit in the PDD, because people always have that question) that's why lines 196-197 say that escapes are not honored when the encoding is specified pmichaud: a backslash is always an escape in a double-quoted string allison: agree, but mentioning it one time in a note will be enough. allison: fair enough; we then need to remove the mention that escape sequences are not honored when an encoding is specified. and we don't support encodings where backslash may be a valid byte in the current implementation, backslashes are honored even when another encoding is specified fair enough -- again, I've been restricting myself to spec. which is more useful? good question. consistency is valuable I think it's more useful to always restrict the "..." to ascii chars, personally -- with everything else escaped. I agree. that's excessive that means you can never directly type a UTF 8 string in PIR code? allison: if not, we must take into account a lot of things, and we complicate the parsing. can never pass a UTF 8 string in from an HLL parser does it matter for PIR? we pass UTF-8 strings in all the time -- they get encoded by PCT it's also an unnecessary restriction the strings are just a series of bytes allison: yes, I think the HLL must be clear about his intention when generating pir. there's no reason to restrict which bytes besides, aren't we moving _away_ from UTF8? (I know we'll always support it, but internally the strings will be something else...?) the restriction enters in from how you specify the encoding allison: that is not what the docs says about complete unicode supoort. pmichaud: internally strings will always be stored in whatever their natural encoding is strings are just a blob of data so then an HLL will have to specify that encoding as part of the string constant anyway, yes? sorry, string literal how you read that data depends on the encoding and character set ...except in my GC, where strings are apparently always stored as a segfault wknight8111: heh :) string literals are fundamentally the same as regular strings, but not modifiable if my HLL has a utf-8 string, it needs to either (1) indicate in the PIR that the string is encoded at utf8, or (2) escape the non-ASCII chars s/at/as/ it can't just stick the UTF-8 string inside of a pair of double quotes and expect PIR to know what to do with it (unless PIR is specified as defaulting to utf8) pmichaud: no, you have to specify an encoding pir should understand utf8:"a utf8 string here" what if my encoding has another meaning for backslash? if you specify no encoding or character set, parrot treats it as an ASCII string anyway, I'll stop here -- no matter what Parrot does the HLL tools will be able to work with it. pmichaud: that's a good point it just seems inconsistent that we allow ambiguous bytes in the string And if utf16 or ucs2 is specified the string can contain 16 or 32 bit encoded unicode chars inside a 8-bit encoded file? That can be a nightmare for text editors. for now, I'm not modifying PDD 19 where it says that specifying an encoding stops parrot from processing backslashes in strings so unicode:"«" is an error? no sorry, I said that wrong fixed8:"\x0a" is a 4-character string? but utf8:unicode:"\uwhatever" is an error no, not an error -- it should be backslash+u+whatever (not an error, but the backslash isn't treated as special) yes So the encoding part defines both the literal interpretation and the generated string? % Auzon has left #parrotsketch NotFound: that's the way I interpret it. it does mean that we can't specify, say, ucs2 literals NotFound: to be specific, the encoding and charset flags on a literal string specify the metadata on the literal string anything that the literal string is assigned to, adopts that metadata from the literal string allison: that looks inconsistent to me. Utf8 must be literal but ucs2 must be always escaped. (not that it matters to me that we can't specify ucs2 literals :-) ? ah, we just need to add a ucs2 encoding flag if we intend it to be used on any regular basis there's no way to encode ucs2 literals containing double quotes There is no sane way to encode 16 or 32 bit chars in an 8 bit text file, except escaping. (even with a ucs2 encoding flag.) pmichaud: ucs2:'"' spinclad, okay, a string with both single and double quotes, then :-) ok also I can't see your null byte in there. * pmichaud looks carefully. And don't event talk about allowing ucs2 pir source files. I'll assume it's there. :-) no null byte. counted string a) we would have to introduce a new quoting syntax, and b) presumably, if you're working with 16 or 32 bit chars, you aren't doing it in an 8 bit text file spinclad: in ucs2, a double quote is \x00\x22 but, really, ucs2 is not a high priority in win32, all files are stored as usc2 by the os as long as there is some way to create ucs2 strings, we can call it good I'm speaking particularly of ucs2 literals pmichaud: if we have a demand for ucs2 literals that can't use escapes, we can do the work to add them ok, ucs2:'' allison: I'm saying that the spec should allow escapes should allow escapes in literal strings that specify an encoding? and not allow oddly-encoded strings in PIR source Then we must allow them in utf8, for consistency. or we can choose to be explicitly inconsistent I have no trouble with that, fwiw more accurately, when an encoding is specified, it should have the metadata to declare whether its strings process escapes I have no problem with saying that utf8:"..." allows utf8 encoded stuff inside the quotes, *and* processes escapes. I think the clean way is to always escape any non ascii character. the only reliable way to represent any generic ucs2 literal is if we allow escapes. or if we separate the PIR encoding from the resulting literal okay, the answer for now (which is effectively what escapes do, but escaping every character is a bit much, I agree.) we allow escapes and non-ascii characters in double-quoted strings double-quoted strings are just blobs of data pmichaud: allowing mixing complicates the parsing for no real gain, IMO. NotFound: I'm not worried about the parsing as much as I am the result I'd much rather be able to produce my constant string in the .pbc output directly than to have to have transcode operations at runtime because there wasn't a way to do it in the PIR originally. the transcode operations produce extra GC-able elements, which is bad. the encoding and character set determine how the resulting data is treated pmichaud: but generating any encoding wanted is not a problem, if the specs clearly states what is. I think every one in the conversation has switched between all three of the positions during this conversation, so we'll have to call that good done NotFound: right now the encoding specifies both the interpretation of the double-quoted string and the encoding of the resulting string. But there are some encodings that we cannot represent in a double-quoted string without having an escaping mechanism. allison: there is a remaining problem: if unicode: is specified, how the escpaes are interpereted? A 8 bit chars that forms ut8, or as unicode points? unicode is not an encoding so it's a normal double-quoted string, where escapes are honored. Not, but the spec says that default is utf8. if there are any non-ASCII characters in the double quotes, they would need to be utf8 An easy question: Should \" be added to line 186 of PDD19 ? But the doubt is how to interpret the escaped ones. I have no doubt about how to interpret the escaped ones (for utf8) unicode:"\xaa" unicode:"«" are the same. barney: Is \" processed as an escape? barney: yes, \" works in double quoted strings (it's absolutely critical, otherwise you can't enter a quote in a double-quoted string it doesn't in the current implementation yes, you can enter a quote in the double quoted string, it's \x22 fair enough pmichaud: is reasonably, but the spec is not clear enough about that, IMO. NotFound: pdd 19 or pdd 28? NotFound: I don't disagree that the spec is unclear. I'm just saying that it's possible for us to have utf-8 encoding and escapes in a single string w/o it being ambiguous pdd 19 is certainly not clear yet allison: 19 allison: we're only talking about pdd19 here. I don't think pdd28 specifies anything about PIR representation of literals \" Is speced in line 129 of PDD19 barney: aha. okay. the current implementation doesn't allow \" barney: added to the spec sorry, I'm wrong, I typoed ignore me. pmichaud-- Is there a way to have a single quote in a single quoted string? \" works now. Yes, it should be added to 186 of pdd19. The problem I see with this approach is that a generated pir that contains both utf8 and iso-8859-1 unescaped characters is not good for the sanity using a text editor no writting specifically yo handle pir source. (1) if someone is editing generated pir, they need to be able to handle it (2) all of PCT's string generation in PIR converts non-ASCII to escapes. But just because PCT does it that way doesn't mean that we always want it to do so that way (3) If someone has string literals in a non-western language, I don't know that I want the generated PIR to always be a bunch of escape characters. It would make sense to allow the utf8 directly in the string literals. (e.g., chinese) okay, the escapes are not persistent in the string they're only a way of representing a character that can't otherwise by typed ...or parsed by PIR. Don't we need some sort of BOM or encoding marker at the start of the PIR file then? as soon as that literal is read into anything, there is no difference between the escape and the utf8 character ...isn't it "as soon as the literal is compiled, there is no difference..."? pmichaud: I also finds nice to be able to write my own name 'Julián' in pir, but not sure it pays the price of support all that. .oO { do we need a BOM at the start of a ucs: string? } .pragma encoding utf8 ?? surely PIR doesn't store the escape sequences in the literals it produces. pmichaud: basically, there's only a difference in the source file right. How do we expect a random text editor to parse .pragma encoding utf8 ? so unicode:"«" and unicode:"\xab" would produce exactly the same result. even down to being the same .pbc output. pmichaud: exactly bom is also ball of mud So unicode:"\xab" and utf8::unicode:"\xab" is also the same result? So unicode:"\xab" and utf8:unicode:"\xab" is also the same result? I don't see a problem with that for utf8 No problem, just wants to be clear about that. NotFound: yes consistency++ consistency++ we'll have to figure out something to do for ucs2 and personally consistency++ I'd prefer it if unicode:"..." accepted utf8 strings in the PIR text but produced Parrot's default internal representation for the constant (i.e., the one in pdd28) couldn't parrot just parse ucs2: as utf16:? parrot doesn't have a default internal representation I think ucs2 or utf816 literals must be forbidden, at least in 8 bit encoded source files. Agreed. (the default internal representation was an idea from an earlier draft that didn't make it in the final cut) I mean utf16 What is the specific problem of ucs2 ? s/of/with/ we're not doing NFG? barney: the problem I see is that many people confuses it with utf16. not as a universal standard, no. NFG is just another additional encoding/charset % cotto_work has left #parrotsketch % cotto_work has joined #parrotsketch since (for speed reasons) I'm going to be converting a lot of things into NFG, there's no way for me to specify a NFG literal without escaping everything? the thing about string data, is you want to avoid transforming it whenever you can % coke has joined #parrotsketch escaping everything won't specify an NFG literal % coke is now known as DietCoke NFG is just a storage format okay, how do I specify an NFG literal? ... wow. haven't even gotten to cotto's question, have ya. =-) no or do my literals always get transcoded at runtime? or...? 'ball of mud' Ok. Don't forget cotto. heading back out. =-) % DietCoke has left DietCoke!coke@feather.perl6.nl (trying to decide if it's an encoding or charset flag) DietCoke: sorry, I imagined this has to be a long discussion, but I think is important to clarify this issues. ... it's an encoding flag pdd28 says that nfg is always unicode codepoints nfg: yes, but they're stored differently (encoded differently) can we interrupt this endless discussion to give cotto his time, so he can get on with life? yes particle: no problem cotto: still around? yes still have a question? yes. It should be a quick one. The Array PMC's freeze/thaw/visit functions are broken. Are they worth fixing or should that rt be rejected? (suggestion for string encoding: allison is undoubtedly busy with oscon, and I don't think string parsing is a pressing issue. Can we save it until the post-oscon hackathon?) cotto_work: they are worth fixing thanks. the one thing worth saving in Array pmc as far as i'm concerned is the sparse storage The urgent questions have been anserwed, the other can be delayed. pmichaud: also, a good bit will be worked out as we implement the strings PDD if that can be rolled into fixed/resizable pmc variants, maybe Array can go away okay. allison and I can review string literals and encodings wrt nfg at the oscon hackathon. and yes, string pdd implementation will add more useful information particle, you mean sparseness? yes I just want to put a hook in that it would be good to have a way to specify literals in PIR that go directly to NFG without requiring an explicit transcode step at runtime. That is the reason why I asked, we can't sanely work in strings without some clarity in this points. I will shut up now until cotto's question is finished. But as I said, the urgent ones had been cleared. I need to review the PIR PDD and launch it out of draft. That'll likely be my hackathon task (including some string conversation with pmichaud). cotto: is your question answered? I win the price for the longer first question? ;) if sparseness if the only thing worth preserving about the Array, would it be better to make the other Array types sparse? NotFound: i give you an hour of my life as a prize. NotFound: you win the prize :) (No matter it really was the second) cotto_work: potentially, yes cotto_work: though, it's still worth making freeze/thaw/visit work meaning "if someone can find the tuits"? SparseResizablePMCArray ok. I can see how freeze/thaw/visit would be a step in the right direction eoq cotto_work: yes, if someone has time. it's not wasted, because they'll have to work for whatever sparse Array results okay, any other questions before we go? Where shall we have lunch? I will miss parrotsketch next week. Technically, that wasn't a question. (are we having parrotsketch next week?) should we skip parrotsketch next week for OSCON? Probably. then yes, no parrotsketch next week we'll resume on July 29th thanks everybody! EOPS % pmichaud has left #parrotsketch % cotto_work has left #parrotsketch % NotFound has left #parrotsketch % allison has left #parrotsketch % jonathan has left #parrotsketch % chromatic has left #parrotsketch % barney has left barney!~bernhard@p549A01E6.dip0.t-ipconnect.de % wknight8111 has left wknight8111!~chatzilla@66.252.102.37 % jhorwitz has left jhorwitz!~chatzilla@96.245.16.45 % davidfetter has left davidfetter!~davidfett@start.fetter.org