OCapN Pre-standardization Group

January 2024

Minutes

JAR: I didn't check for closed issues but I don't think there are any

#48 Poll on naming of bytearray/bytevector/bytestring

JAR: Since it's older business, I put poll of bytevector vs bytestring vs bytearray

MarkM: Poll on naming?

JAR: Yes

MarkM: 5 rocket ships, rocket ships mean byte array, we don't normally determine by vote except by consensus, but I believe we had consensus to speak by vote. In which case I believe ByteArray is the answer

JAR: In that case, that's the conclusion

#47 Strings on the wire and handling for example lone surrogate

JAR: Last two items are things to do if we have time. Next item is from Jessica about strings, this has been contentious or confusing as a topic and we should talk about it. There's already been some good discussion in the issue

tsyesika: I think we have broad agreement to support UTF-8 only, but there's an open issue on what to do with invalid strings. The two options are to validate and reject them or to round trip them. There was also some discussion from richard gibson on what json does and doesn't require. I think spritely had proposed validating them and rejecting them as the answer

MarkM: Richard, can you comment on both what the JSON standard and the ecmascript standard in regard to json operations, in both cases what they do with the invalid strings

Richard: They both align on this, JSON can only be communicated with interchange with UTF-8, but strings inside can contain arbitrary codepoints which are not expressible in utf-8. There are refinements such as ijson

Christine: Jessica suggested that spritely proposed we reject invalid strings. I think we proposed two things:

  1. Put an outgoing MUST have valid UTF-8, incoming SHOULD reject UTF-8 (stronger: outgoing)
  2. MUST on both outgoing & incoming.

MarkM: is there anyone here from CapN Proto? I remember that CapN Proto does not validate and that Ian had suggested that they are unlikely to agree to it. And how should we react to it, especially if we have anyone here from CapN Proto to confirm or deny

tsyesika: I also seem to remember from CapN proto that validation of strings won't be valid.

tsyesika: I have a question from Agoric's perspective, would limiting to what's representable in utf-8 in sending, would that cause an issue if we limited to utf-8 only outgoing

MarkM: I believe that it would not create a problem for Agoric, that Agoric could ensure that we're only sending valid strings. If that's the position we're taking, we'll probably ensure in our implementation that we'd validate incoming strings, one reason we would is that... and this is one reason I'm uncomfortable with the SHOULD rather than the MUST on incoming... if an implementation accepts an invalid string incoming, then an implementation that doesn't validate incoming would be obligated to detect and report an error if a string is outgoing which is often the case, which means you're still rejecting it but not at the earliest opportunity which seems unpleasant

MarkM: I will also remind everyone of Eich's law. Postel's is be conservative in what you send and liberal in what you accept. Eich's law is if you are liberal in what you accept, others will utterly fail to be conservative in what they send.

MarkM: So Agoric would either do MUST and MUST or we'd do neither. I think it's fine for us to do MUST and MUST

tsyesika: from Spritely's side we're also fine to do MUST and MUST. The MUST and SHOULD proposal was based on feedback from capn proto

Richard: I also am good with MUST and MUST

MarkM: In a recent tc39 meeting someone proposed adding a predicate to JS itself to determine whether a string was valid unicode

Richard: yes and believe it's at stage 3

MarkM: That would allow Agoric to efficiently implement then

kriskowal: It seems to me we have consensus, it's good to move forward with this even though it doesn't move with JSON's round tripping approach.

MarkM: yes, and being a superset of principles was a principle we want to honor. I think the compatibility cost of not being a superset of JSON in this regard... I'm making a judgement that it's okay.

Christine: I'm going to jump on this consensus and take it to an action. I'm going to note it's true that we don't have Cap'n Proto folks to say if this is easy or not. When Ian was here he was pushing to get agreement between Spritely <-> Agoric as it's uncertain if Cap'n Proto will be able to interop without a bridge. So based on that I'm going to propose:

PROPOSED: OCapN will specify that strings MUST be valid UTF-8 on both outgoing messages and on incoming (requiring a validation step).

+1 Mark Miller +1 Baldur +1 David Thompson +1 Christine Lemmer-Webber +1 Kris Kowal +1 Jessica Tallon +1 Alex M +1 JAR +1 Juli

ACCEPTED: OCapN will specify that strings MUST be valid UTF-8 on both outgoing messages and on incoming (requiring a validation step).

JAR: I just want to be clear, is that sufficient for covering strings?

How to sort strings

MarkM: A question, OCapN doesn't currently speak about sorting strings, we've agreed internally within Agoric to do proper unicode sorting by unicode's standard utf-8 codepoints, but so far we haven't discussed sorting

tsyesika: In which situations are we sorting? key names in smallcaps?

MarkM: Yes. I suppose this would only be controvercial if we were preserving utf-16 unpaired surrogate preservation. I propose that we move to utf-8 as the sorting mechanism

PROPOSED: Use sorting based on UTF-8's on-wire serialization bytearray representation within OCapN within any situation where an official form of string sorting is required. Because of the design of UTF-8, i.e. the same as lexicographic order of unicode strings by unicode code point.

+1 Christine Lemmer-Webber +1 Jessica Tallon +1 Richard Gibson +1 Juli +1 Baldur +1 Jonathan A. Rees +1 David Thompson +1 Mark Miller +1 Kris Kowal

RESOLVED: Use sorting based on UTF-8's on-wire serialization bytearray representation within OCapN within any situation where an official form of string sorting is required. Because of the design of UTF-8, i.e. the same as lexicographic order of unicode strings by unicode code point.

Review process for merging PRs

JAR: Next item is about merging PRs. We had this protocol which Dan proposed for issues, which was that anyone can propose an issue anyone can close the issue. I was wondering whether there's a similar question for commits to the repository but the only feedback I heard was that state of the repository is more important because readers might interpret that as canonical

cwebber: We haven't established an editor, I would feel more comfortable with this if the editor assessed there was consensus

tsyesika: if they are minor issues such as typos, that's fine, but if they're major issues, I feel there should be assessment from the group and voting by the group

MarkM: I want to agree with what Jessica said. At TC39 we have the idea of "needs consensus" PRs. If there's any change to the normative behavior of the standard, then we need to bring to plenary and get merged. Then the editors themselves in the PR can determine whether an issues is purely editorial. If there is any grey area, then you take it to plenary. I'm comfortable with that process.

JAR: Who decides that?

MarkM: The editors decide that.

tsyesika: I was going to add to the proposal of having editors, at least two, and they be co-editors, at least one from different organizations, eg Spritely and Agoric

Christine: I would feel very comfortable with Jessica Tallon and Kris Kowal as editors, who have both carefully reviewed for community consensus across groups

JAR: I think I'd like to get an assesment of whether or not everything should go through the group

Jessica: I agree with what Mark said. I support the idea of PRs which need consensus, and then the rest don't need to clog up the meeting with needless +1s

MarkM: And anything in a grey area, we bring towards the group

kriskowal: my understanding is that the role of the editor is to determine which changes are normative and which aren't, and to merge PRs

JAR: This sounds common-sensey and I think isn't normative in terms of the spec

cwebber: but it is in terms of process

JAR: Which indirectly applies. So I have heard some clear statements, but would like to see a concrete proposal

PROPOSED: The OCapN group shall establish editors who determine which proposed specification changes are editorial and do not require group consensus versus which require group input and consensus (i.e. "Needs Consensus" PRs). The editors also have the authority to merge such changes into the official specification repository. In case of doubt of whether a PR is editorials or normative, editors will err on the side of taking said changes to the group for review to gather consensus.

+1 Christine Lemmer-Webber +1 Mark Miller +1 Jessica Tallon +1 David Thompson +1 Baldur +1 Juli +1 Kris Kowal +1 Alex M

ACCEPTED: The OCapN group shall establish editors who determine which proposed specification changes are editorial and do not require group consensus versus which require group input and consensus (i.e. "Needs Consensus" PRs). The editors also have the authority to merge such changes into the official specification repository. In case of doubt of whether a PR is editorials or normative, editors will err on the side of taking said changes to the group for review to gather consensus.

JAR: to clarify, which documents

Christine: we left it broad, so I believe it covers all text unless we choose in the future to break it out

nominated editors: Kris Kowal and Jessica Tallon

kriskowal: I accept my nomination

tsyesika: I accept my nomination

PROPOSED: Accept Jessica Tallon and Kris Kowal as editors for OCapN's specifications.

+1 Christine Lemmer-Webber +1 Jessica Tallon +1 Juli +1 David Thompson +1 Mark Miller +1 Kris Kowal +1 Richard Gibson +1 Baldur

ACCEPTED: Accept Jessica Tallon and Kris Kowal as editors for OCapN's specifications.

Misc end of meeting business

JAR: Is there any objection to adding the implementation guide to the repository?

kriskowal: I would like to review the PRs out-of-band

JAR: Okay I will take that as a modification to what we're doing right now

JAR: Okay, so editors have sufficient guidance on this. Yes, some of this looks complicated. I'm happy to move on from this item.

JAR: I believe we have concluded the agenda. But I would like to suggest that people pay attention to the agenda so we don't postpone.

cwebber: When would we consider moving to a standards group? When we have interoperability demonstrated? And probably to the IETF?

MarkM: Are we already a standards group? Does it matter if it's government-approved?

JAR: It may matter for some people performing certain work which requires government-required standards.

MarkM: That's a good point