OCapN Pre-standardization Group
January 2024
- Chair: Jonathan Rees
- Scribe: Christine Lemmer-Webber
Present:
Alex M (independent)
Baldur (zarutian) (independent)
Christine Lemmer-Webber (cwebber) (Spritely)
David Thompson (Spritely)
Jessica Tallon (tsyesika) (Spritely)
Jonathan A. Rees (JAR)
Juli (Independent)
Kris Kowal (Agoric)
Mark Miller (Agoric)
Mathieu Hofman
Richard Gibson (Agoric)
Minutes
JAR: I didn't check for closed issues but I don't think there are any
#48 Poll on naming of bytearray/bytevector/bytestring
JAR: Since it's older business, I put poll of bytevector vs bytestring vs bytearray
MarkM: Poll on naming?
JAR: Yes
MarkM: 5 rocket ships, rocket ships mean byte array, we don't normally determine by vote except by consensus, but I believe we had consensus to speak by vote. In which case I believe ByteArray is the answer
JAR: In that case, that's the conclusion
#47 Strings on the wire and handling for example lone surrogate
JAR: Last two items are things to do if we have time. Next item is from Jessica about strings, this has been contentious or confusing as a topic and we should talk about it. There's already been some good discussion in the issue
tsyesika: I think we have broad agreement to support UTF-8 only, but there's an open issue on what to do with invalid strings. The two options are to validate and reject them or to round trip them. There was also some discussion from richard gibson on what json does and doesn't require. I think spritely had proposed validating them and rejecting them as the answer
MarkM: Richard, can you comment on both what the JSON standard and the ecmascript standard in regard to json operations, in both cases what they do with the invalid strings
Richard: They both align on this, JSON can only be communicated with interchange with UTF-8, but strings inside can contain arbitrary codepoints which are not expressible in utf-8. There are refinements such as ijson
Christine: Jessica suggested that spritely proposed we reject invalid strings. I think we proposed two things:
- Put an outgoing MUST have valid UTF-8, incoming SHOULD reject UTF-8 (stronger: outgoing)
- MUST on both outgoing & incoming.
MarkM: is there anyone here from CapN Proto? I remember that CapN Proto does not validate and that Ian had suggested that they are unlikely to agree to it. And how should we react to it, especially if we have anyone here from CapN Proto to confirm or deny
tsyesika: I also seem to remember from CapN proto that validation of strings won't be valid.
tsyesika: I have a question from Agoric's perspective, would limiting to what's representable in utf-8 in sending, would that cause an issue if we limited to utf-8 only outgoing
MarkM: I believe that it would not create a problem for Agoric, that Agoric could ensure that we're only sending valid strings. If that's the position we're taking, we'll probably ensure in our implementation that we'd validate incoming strings, one reason we would is that... and this is one reason I'm uncomfortable with the SHOULD rather than the MUST on incoming... if an implementation accepts an invalid string incoming, then an implementation that doesn't validate incoming would be obligated to detect and report an error if a string is outgoing which is often the case, which means you're still rejecting it but not at the earliest opportunity which seems unpleasant
MarkM: I will also remind everyone of Eich's law. Postel's is be conservative in what you send and liberal in what you accept. Eich's law is if you are liberal in what you accept, others will utterly fail to be conservative in what they send.
MarkM: So Agoric would either do MUST and MUST or we'd do neither. I think it's fine for us to do MUST and MUST
tsyesika: from Spritely's side we're also fine to do MUST and MUST. The MUST and SHOULD proposal was based on feedback from capn proto
Richard: I also am good with MUST and MUST
MarkM: In a recent tc39 meeting someone proposed adding a predicate to JS itself to determine whether a string was valid unicode
Richard: yes and believe it's at stage 3
MarkM: That would allow Agoric to efficiently implement then
kriskowal: It seems to me we have consensus, it's good to move forward with this even though it doesn't move with JSON's round tripping approach.
MarkM: yes, and being a superset of principles was a principle we want to honor. I think the compatibility cost of not being a superset of JSON in this regard... I'm making a judgement that it's okay.
Christine: I'm going to jump on this consensus and take it to an action. I'm going to note it's true that we don't have Cap'n Proto folks to say if this is easy or not. When Ian was here he was pushing to get agreement between Spritely <-> Agoric as it's uncertain if Cap'n Proto will be able to interop without a bridge. So based on that I'm going to propose:
PROPOSED: OCapN will specify that strings MUST be valid UTF-8 on both outgoing messages and on incoming (requiring a validation step).
+1 Mark Miller +1 Baldur +1 David Thompson +1 Christine Lemmer-Webber +1 Kris Kowal +1 Jessica Tallon +1 Alex M +1 JAR +1 Juli
ACCEPTED: OCapN will specify that strings MUST be valid UTF-8 on both outgoing messages and on incoming (requiring a validation step).
JAR: I just want to be clear, is that sufficient for covering strings?
How to sort strings
MarkM: A question, OCapN doesn't currently speak about sorting strings, we've agreed internally within Agoric to do proper unicode sorting by unicode's standard utf-8 codepoints, but so far we haven't discussed sorting
tsyesika: In which situations are we sorting? key names in smallcaps?
MarkM: Yes. I suppose this would only be controvercial if we were preserving utf-16 unpaired surrogate preservation. I propose that we move to utf-8 as the sorting mechanism
PROPOSED: Use sorting based on UTF-8's on-wire serialization bytearray representation within OCapN within any situation where an official form of string sorting is required. Because of the design of UTF-8, i.e. the same as lexicographic order of unicode strings by unicode code point.
+1 Christine Lemmer-Webber +1 Jessica Tallon +1 Richard Gibson +1 Juli +1 Baldur +1 Jonathan A. Rees +1 David Thompson +1 Mark Miller +1 Kris Kowal
RESOLVED: Use sorting based on UTF-8's on-wire serialization bytearray representation within OCapN within any situation where an official form of string sorting is required. Because of the design of UTF-8, i.e. the same as lexicographic order of unicode strings by unicode code point.
Review process for merging PRs
JAR: Next item is about merging PRs. We had this protocol which Dan proposed for issues, which was that anyone can propose an issue anyone can close the issue. I was wondering whether there's a similar question for commits to the repository but the only feedback I heard was that state of the repository is more important because readers might interpret that as canonical
cwebber: We haven't established an editor, I would feel more comfortable with this if the editor assessed there was consensus
tsyesika: if they are minor issues such as typos, that's fine, but if they're major issues, I feel there should be assessment from the group and voting by the group
MarkM: I want to agree with what Jessica said. At TC39 we have the idea of "needs consensus" PRs. If there's any change to the normative behavior of the standard, then we need to bring to plenary and get merged. Then the editors themselves in the PR can determine whether an issues is purely editorial. If there is any grey area, then you take it to plenary. I'm comfortable with that process.
JAR: Who decides that?
MarkM: The editors decide that.
tsyesika: I was going to add to the proposal of having editors, at least two, and they be co-editors, at least one from different organizations, eg Spritely and Agoric
Christine: I would feel very comfortable with Jessica Tallon and Kris Kowal as editors, who have both carefully reviewed for community consensus across groups
JAR: I think I'd like to get an assesment of whether or not everything should go through the group
Jessica: I agree with what Mark said. I support the idea of PRs which need consensus, and then the rest don't need to clog up the meeting with needless +1s
MarkM: And anything in a grey area, we bring towards the group
kriskowal: my understanding is that the role of the editor is to determine which changes are normative and which aren't, and to merge PRs
JAR: This sounds common-sensey and I think isn't normative in terms of the spec
cwebber: but it is in terms of process
JAR: Which indirectly applies. So I have heard some clear statements, but would like to see a concrete proposal
PROPOSED: The OCapN group shall establish editors who determine which proposed specification changes are editorial and do not require group consensus versus which require group input and consensus (i.e. "Needs Consensus" PRs). The editors also have the authority to merge such changes into the official specification repository. In case of doubt of whether a PR is editorials or normative, editors will err on the side of taking said changes to the group for review to gather consensus.
+1 Christine Lemmer-Webber +1 Mark Miller +1 Jessica Tallon +1 David Thompson +1 Baldur +1 Juli +1 Kris Kowal +1 Alex M
ACCEPTED: The OCapN group shall establish editors who determine which proposed specification changes are editorial and do not require group consensus versus which require group input and consensus (i.e. "Needs Consensus" PRs). The editors also have the authority to merge such changes into the official specification repository. In case of doubt of whether a PR is editorials or normative, editors will err on the side of taking said changes to the group for review to gather consensus.
JAR: to clarify, which documents
Christine: we left it broad, so I believe it covers all text unless we choose in the future to break it out
nominated editors: Kris Kowal and Jessica Tallon
kriskowal: I accept my nomination
tsyesika: I accept my nomination
PROPOSED: Accept Jessica Tallon and Kris Kowal as editors for OCapN's specifications.
+1 Christine Lemmer-Webber +1 Jessica Tallon +1 Juli +1 David Thompson +1 Mark Miller +1 Kris Kowal +1 Richard Gibson +1 Baldur
ACCEPTED: Accept Jessica Tallon and Kris Kowal as editors for OCapN's specifications.
Misc end of meeting business
JAR: Is there any objection to adding the implementation guide to the repository?
kriskowal: I would like to review the PRs out-of-band
JAR: Okay I will take that as a modification to what we're doing right now
JAR: Okay, so editors have sufficient guidance on this. Yes, some of this looks complicated. I'm happy to move on from this item.
JAR: I believe we have concluded the agenda. But I would like to suggest that people pay attention to the agenda so we don't postpone.
cwebber: When would we consider moving to a standards group? When we have interoperability demonstrated? And probably to the IETF?
MarkM: Are we already a standards group? Does it matter if it's government-approved?
JAR: It may matter for some people performing certain work which requires government-required standards.
MarkM: That's a good point