ChanServ changed the topic of #wayland to: https://wayland.freedesktop.org | Discussion about the Wayland protocol and its implementations, plus libinput
fmuellner has joined #wayland
fmuellner has quit [Ping timeout: 480 seconds]
Calandracas_ has quit [Remote host closed the connection]
coldfeet has quit [Remote host closed the connection]
kts has quit [Quit: Konversation terminated!]
coldfeet has joined #wayland
coldfeet has quit [Remote host closed the connection]
cool110 has joined #wayland
cool110 is now known as Guest2206
Tokoyami has quit [Quit: ~@~]
Tokoyami has joined #wayland
Guest2143 has quit [Ping timeout: 480 seconds]
karolherbst has quit [Remote host closed the connection]
karolherbst has joined #wayland
kts has joined #wayland
nerdopolis has joined #wayland
Moprius has joined #wayland
Moprius has quit [Quit: bye]
mclasen has quit [Quit: mclasen]
mclasen has joined #wayland
kts has quit [Ping timeout: 480 seconds]
tzimmermann has quit [Quit: Leaving]
<emersion>
kennylevinsen: the website doesn't contain the UTF-8 thing because that commit hasn't been released yet
<davidre>
Isn't that kinda an important break, if someone ferried non utf8 in a custom protocol
<kennylevinsen>
yeah, that makes sense
<mclasen>
kennylevinsen: no idea what you are talking about with differentiating utf8 encoded vs decodable
<mclasen>
either it is valid utf8, or it isn't
<mclasen>
that is a pretty black-and-white thing
<emersion>
davidre: some implementations already killed clients based on UTF-8 encoding
<emersion>
and we don't know of any such custom protocol
<kennylevinsen>
mclasen: no? any sequence of bytes &0x7F can decode as utf-8, but that doesn't mean it was ever utf-8
<emersion>
can always revert it if it makes sense to
<mclasen>
there is no difference between "can be decoded as utf8" and "is utf8"
<kennylevinsen>
Maybe I'm nitpicking, but it still feels like a bogus validation
<mclasen>
if it can be decoded as utf8, it is utf8
<mclasen>
there's no secret 'essence of utf8' that you could miss
<kennylevinsen>
mclasen: by that definition, /dev/random &0x7F is an UTF-8 string. I only define an utf-8 string as "text that was *encoded* as UTF-8"
<psykose>
it would indeed be a utf8 string
<kennylevinsen>
and if you want to ensure that "clients are using the right charset", then you care about the latter
<mclasen>
whether decoding yields is a meaningful Unicode string is an entirely different question
<kennylevinsen>
that's not hwo I read the spec: "the string must be encoded in UTF-8" - if you have to encode in UTF-8, it must yield a meaningful unicode string
<mclasen>
which spec is this?
<kennylevinsen>
that was just xdg_toplevel:;set_title
<kennylevinsen>
(wording aside, I imagine everybody would agree that a client is buggy and not adhering to spec if what it sends does not actually decode as a meaningful unicode string)
rv1sr has joined #wayland
<vyivel>
did we run out of sheds to color
<kennylevinsen>
("meaningful" in this case meaning "the unicode string the client actually wanted the user to see")
<kennylevinsen>
vyivel: it's regarding a MR to add UTF-8 validation to libwayland
<vyivel>
i'm aware
<vyivel>
you can validate data to match a format, you can't validate the intent, doesn't make it less useful
<mclasen>
the spec could be more precise, for sure
<kennylevinsen>
My word isn't final anyway, I just don't see any point in having a utf-8 validator in libwayland and would just let it be up to compositors to decide if they want to strip or render mojibake :/
<mclasen>
for more most people, "utf-8 encoded" is pretty clear
<kennylevinsen>
I wholeheartedly accept that I may at times be pedantic in non-standard ways, and don't always know when that is the case. :)
<mclasen>
you could say "utf-8 encoded Unicode string"
<mclasen>
but what constitutes a valid unicode string is a good deal less clear than what utf-8 is
<mclasen>
and really, its up to the compositor to handle the data safely, and use its best judgement when I send it something that may be valid utf8, but maybe not valid Unicode
kts has joined #wayland
<kennylevinsen>
I would just extend that and say that the client is required to send utf-8 encoded unicode string as now, and not doing so is broken - but whether the compositor can decide whether it wants to show the garbage it got or fail the client
<soreau>
Is this about fixing a client sending '\U000xyz00' as a char in the title through the wayland socket currently?
<jadahl>
isn't "garbage" a fluid thing? what was "garbage" when the compositor was written became valid when the client was written
<jadahl>
with new code points being added all the time
<kennylevinsen>
(garbage referring to whatever the result of rendering the borked text is, mojibake/tofu/whatever inclusive)
Company has joined #wayland
<mclasen>
jadahl: that was my point. Unicode is fluid
<davidre>
Valid utf8 zalgo window titles :)
<mclasen>
not saying it is garbage...
<kennylevinsen>
I do sometimes find these kinds of discussions where you realize your quite firm mental model of something differs a lot from everyone else quite intriguing. Teaches you a lot about how people see things differently.
* mclasen
hates food metaphors in software, so not going with 'tofu'
<kennylevinsen>
hah
<kennylevinsen>
do we have another word for the unicode codepoint square?
<mclasen>
hexbox is the more descriptive name
<kennylevinsen>
that kind of sounds like something you'd use to curse someone, but fair
<kennylevinsen>
mojibake is pretty standard for garbage displayed when encoding and decoding charset isn't matched
<davidre>
Looks like our taskamanger needs a bit of clipping :D
<jadahl>
davidre: blurring the line of what's garbage and not :P
<DemiMarie>
As the one who made the MR: My main concern is that a compositor might pass a non-UTF-8 string to an API that requires UTF-8 and create a security hole. For example, there are many glib APIs that require valid UTF-8. Also, earlier validation is better for security.
<kennylevinsen>
DemiMarie: I'm actually slightly concerned that this would be thought to be sufficient for security
<kennylevinsen>
e.g., skipping normalization and such
<kennylevinsen>
would be good enough for strictly utf-8-parsing related bugs though
<kennylevinsen>
I do personally prefer explicitly treating foreign data as untrusted and let checks belong at time of use depending on need, as I feel it reduces risk of misjudging guarantees
bnason2 has quit [Ping timeout: 480 seconds]
funderscore is now known as f_
<DemiMarie>
kennylevinsen: It's not sufficient for security on its own.
<DemiMarie>
There are, however, quite a few APIs that require valid UTF-8 or else undefined behavior results.
<vyivel>
^
<DemiMarie>
Also, if a protocol requires something, implementations of that protocol should check that something.
<kennylevinsen>
glib has its own built-in utf-8 validation function, and in case fo the Rust example from the MR one even has to explicitly go out of your way to call unsafe conversions to make it blindly assume utf-8...
<DemiMarie>
Rust is nice and encodes "is this known to be valid UTF-8?" into the type system. Most languages are not so nice.
<kennylevinsen>
Our protocols always require much more than we can validate, so I'm not really caught on that aspect. It's just bugs, and some get caught by validation, some get caught by user eyeballs, etc.
<kennylevinsen>
DemiMarie: I know, it's just that Rust's str was the specific example in the MR :)
<emersion>
undefined behavior if input isn't valid utf8? never heard of such a cursed api
<DemiMarie>
emersion: glib has quite a few of them
<DemiMarie>
kennylevinsen: Not my example.
<kennylevinsen>
I am aware
gryffus has joined #wayland
<DemiMarie>
Validating this in libwayland prevents an entire class of vulnerabilities.
<DemiMarie>
It also ensures that sending invalid UTF-8 will be detected by every compositor, making "it works on my machine" issues less likely.
<kennylevinsen>
I don't believe in duct tape security
<kennylevinsen>
detection is true, my only problem there is that it will probably mainly affect and be reproduced by minority users
<DemiMarie>
Is that because of incorrectly encoded data?
<DemiMarie>
Also, what do you mean by "duct tape security"?
<kennylevinsen>
I think the most likely bug here would be "client includes a string from a document that is foreign encoding in its title" - to trip it you need a user to use that client with non-english, non-utf8 content, which is an increasingly rare occurrence.
<DemiMarie>
I see.
<DemiMarie>
One option would be to replace invalid bytes with U+FFFD REPLACEMENT CHARACTER.
<kennylevinsen>
(because utf-8 is widely adopted, not because of non-english of course - e.g., my language requires either UTF-8 or ISO-8859-1 to encode the last 3 letters of its alphabet, so you need not only an ISO-8859-1 document, but one whose title also used those three letters)
<DemiMarie>
Such conversion belongs on the client side, though.
<kennylevinsen>
yes but that's the kind of bug you're trying to catch
<kennylevinsen>
a non-malicious bug would be that it accidentally took that title and used it, byte for byte
<kennylevinsen>
but the likelihood of this check catching such flaw is tiny
<DemiMarie>
I see
<kennylevinsen>
what I mean by duct tape security is trying to cover up broken software by doing a few checks in front of it and hoping that does the trick, vs. protecting the actual usage properly
<DemiMarie>
I see
<kennylevinsen>
(in my line of work I'm constantly dealing with security products placed in front of broken stuff, which gives a false sense of security by covering a few bugs but not fixing underlying issues)
<kennylevinsen>
(the pain of devsecops with corp IT and separate security departments, but that's off-topic)
<kennylevinsen>
(Not to say that early checks cannot be good, but a check for the check’s sake is not)
leon-anavi has quit [Quit: Leaving]
<dottedmag>
IMHO "here's a bag of bytes, they are valid utf-8 sequences" is no worse than "here's a bag of bytes, the spec says it's utf-8, knock yourself out validating it", and having multiple checkers in every implementation, all slightly different.
melonai56 has joined #wayland
<mclasen>
DemiMarie: it will be detected by every compositor using libwayland-server. Wayland is defined by the protocol, not the library...
<Company>
you guys need a testsuite that fails if the compositor doesn't reject invalid utf8
<Company>
and then you do a webpage that validates all the compositors against it
<Company>
and then you let social media make sure that all compositors are valid
bnason2 has joined #wayland
kts has quit [Quit: Konversation terminated!]
gryffus has quit [Ping timeout: 480 seconds]
garnacho has joined #wayland
garnacho has quit [Ping timeout: 480 seconds]
<bl4ckb0ne>
sounds like a lot of work
Narrat has joined #wayland
<kennylevinsen>
Compliance by social media shaming is certainly a very modern approach
fmuellner has quit [Ping timeout: 480 seconds]
coldfeet has joined #wayland
<DemiMarie>
Company: Or libwayland can do the validation for every compositor.
<Company>
that still needs testing
<mclasen>
for every compositor using libwayland
<Company>
you also need to test the other thing
vyivel has left #wayland [#wayland]
<Company>
that the compositors correctly accept utf-8
<Company>
and do the right thing with codepoints that they have no glyphs for and fun stuff like that
<DemiMarie>
mclasen: true, but that is still much less work than checking in each and every request & event handler.
<DemiMarie>
Company: that is still needed, yes.
<zamundaaa[m]>
Demi: fwiw I don't understand the pushback in here, having this check be done by the library seems purely like a good thing to me
<DemiMarie>
zamundaaa: thank you.
<DemiMarie>
Should libwayland validate enums as well?
gryffus has joined #wayland
<zamundaaa[m]>
If they're also validated on the client side, probably
<zamundaaa[m]>
Otherwise it's likely better to have the compositor emit the relevant protocol error instead, so the client knows what's going on
feaneron has joined #wayland
vyivel has joined #wayland
<DemiMarie>
Does that mean that checks should be symmetric between marshalling and unmarshalling?
vyivel has left #wayland [#wayland]
fmuellner has joined #wayland
garnacho has joined #wayland
bnason25 has joined #wayland
bnason2 has quit [Read error: Connection reset by peer]
chamlis_ has quit [Remote host closed the connection]
chamlis_ has joined #wayland
chamlis_ has quit [Remote host closed the connection]
chamlis_ has joined #wayland
agomez has joined #wayland
whot1 has joined #wayland
lbia_ has joined #wayland
paulk-bis has joined #wayland
Tokoyami has quit [reticulum.oftc.net helix.oftc.net]
Company has quit [reticulum.oftc.net helix.oftc.net]
mripard has quit [reticulum.oftc.net helix.oftc.net]
MrCooper has quit [reticulum.oftc.net helix.oftc.net]
coldfeet has quit [reticulum.oftc.net helix.oftc.net]
mvlad has quit [reticulum.oftc.net helix.oftc.net]
glennk has quit [reticulum.oftc.net helix.oftc.net]
mxz has quit [reticulum.oftc.net helix.oftc.net]
sima has quit [reticulum.oftc.net helix.oftc.net]
pounce has quit [reticulum.oftc.net helix.oftc.net]
chamlis has quit [reticulum.oftc.net helix.oftc.net]
pochu has quit [reticulum.oftc.net helix.oftc.net]
KDDLB has quit [reticulum.oftc.net helix.oftc.net]
avu has quit [reticulum.oftc.net helix.oftc.net]
flokli has quit [reticulum.oftc.net helix.oftc.net]
FreeFull has quit [reticulum.oftc.net helix.oftc.net]
azerov has quit [reticulum.oftc.net helix.oftc.net]
DodoGTA has quit [reticulum.oftc.net helix.oftc.net]
OrkooffSep2[m] has quit [reticulum.oftc.net helix.oftc.net]
andyrtr has quit [reticulum.oftc.net helix.oftc.net]
paulk has quit [reticulum.oftc.net helix.oftc.net]
lbia has quit [reticulum.oftc.net helix.oftc.net]
vova has quit [reticulum.oftc.net helix.oftc.net]
sergi has quit [reticulum.oftc.net helix.oftc.net]
joantolo[m] has quit [reticulum.oftc.net helix.oftc.net]
occivink has quit [reticulum.oftc.net helix.oftc.net]
calcul0n has quit [reticulum.oftc.net helix.oftc.net]
tarTLSeau has quit [reticulum.oftc.net helix.oftc.net]
DemiMarie has quit [reticulum.oftc.net helix.oftc.net]
whot has quit [reticulum.oftc.net helix.oftc.net]
tanty has quit [reticulum.oftc.net helix.oftc.net]
fmuellner has quit [reticulum.oftc.net helix.oftc.net]
pochu_ has joined #wayland
andyrtr has joined #wayland
mxz_ is now known as mxz
vova_ is now known as vova
DodoGTA has joined #wayland
pounce_ is now known as pounce
calcul0n has joined #wayland
calcul0n has quit [Killed (reticulum.oftc.net (Nick collision (new)))]
coldfeet has joined #wayland
FreeFull_ has joined #wayland
sergi has joined #wayland
tanty has joined #wayland
coldfeet_ has joined #wayland
paulk has joined #wayland
Company has joined #wayland
glennk has joined #wayland
mvlad has joined #wayland
sima has joined #wayland
avu has joined #wayland
flokli has joined #wayland
FreeFull has joined #wayland
OrkooffSep2[m] has joined #wayland
KDDLB has joined #wayland
pochu has joined #wayland
lbia has joined #wayland
joantolo[m] has joined #wayland
occivink has joined #wayland
calcul0n has joined #wayland
tarTLSeau has joined #wayland
whot has joined #wayland
DemiMarie has joined #wayland
mripard has joined #wayland
joantolo[m] has quit [Ping timeout: 480 seconds]
occivink has quit [Ping timeout: 480 seconds]
calcul0n has quit [Ping timeout: 480 seconds]
tarTLSeau has quit [Ping timeout: 480 seconds]
calcul0n has joined #wayland
mvlad has quit [Remote host closed the connection]
FreeFull has quit [Ping timeout: 480 seconds]
whot has quit [Ping timeout: 480 seconds]
coldfeet has quit [Ping timeout: 480 seconds]
lbia has quit [Ping timeout: 480 seconds]
paulk has quit [Read error: Network is unreachable]
tarTLSeau has joined #wayland
sima has quit [Ping timeout: 480 seconds]
pochu has quit [Ping timeout: 480 seconds]
tanty has quit [Ping timeout: 480 seconds]
KDDLB has quit [Ping timeout: 480 seconds]
glennk has quit [Ping timeout: 480 seconds]
mripard has quit [Ping timeout: 480 seconds]
glennk has joined #wayland
<DemiMarie>
zamundaaa: What if the unmarshalling code could emit a helpful protocol error, such as "Bad enum value X for version Y"?
occivink has joined #wayland
sima has joined #wayland
Tokoyami` is now known as Tokoyami
<zamundaaa[m]>
Yeah, that should work
joantolo[m] has joined #wayland
chamlis_ has quit [Remote host closed the connection]
chamlis has joined #wayland
<jadahl>
there were at times ideas to allow extending enums in other protocol extensions. this idea in particular (early xdg-shell) was about storing them in an array, but doesn't seem unthinkable that there could be protocols that do this to plain enums too. error:ing out clients doing this on an IPC level would regress that
remanifest has joined #wayland
chamlis_ has joined #wayland
remanifest has quit [Remote host closed the connection]
remanifest has joined #wayland
pochu_ has quit []
pochu has joined #wayland
chamlis has quit [Ping timeout: 480 seconds]
chamlis_ has quit [Remote host closed the connection]
chamlis has joined #wayland
coldfeet_ has quit []
rv1sr has quit []
dos1 has quit [Quit: Kabum!]
bnason2 has quit [Read error: Connection reset by peer]
bnason2 has joined #wayland
<Company>
jadahl: I would assume you'd use an int in that case
<Company>
xx-color kinda is in that situation where it has named vs manually specified primaries
Company has quit [Remote host closed the connection]
sima has quit [Ping timeout: 480 seconds]
Company has joined #wayland
Company has quit [Remote host closed the connection]
Company has joined #wayland
feaneron has quit [Remote host closed the connection]
<emersion>
wl_shm allows values outside of enum
Company has quit [Remote host closed the connection]
Company has joined #wayland
mvlad_ has quit [Remote host closed the connection]