#dev | Logs for 2019-10-15
« return
[08:12:14] <FatPhil> TheMightyBuzzard: check what I just mentioned in #staff please, there seems to be a bug
[09:16:16] <Bytram> FatPhil: see my reply in #staff, specifically that this was already found, discussed, and a workaround found in the discussion about the missing journal entry story. In short, do not use "&" in URLs; either use a bare "&" or percent encode it as "%26" (I think that is the right value)
[09:45:42] <FatPhil> %26 is a scary non-solution IMHO. If that doesn't break things, then something is way more broken than I can possibly comprehend.
[09:46:38] <FatPhil> BEcause that %26 must be sendable to the remote server by the user's browser, and that remote server must not interpret it as the special-meaning '&'.
[09:49:32] <Bytram> it's call percent-encoded URLs; onbe needs to be able to use just ASCII chars to send any URL, including Unicode, there are something like 6-10 RFCs that address this whole thing. If you want, I can track down the ones I referred to when testing our UTF-8 (Unicode) implementation.
[09:49:54] <FatPhil> I know what it's called, and I know what it's for.
[09:52:04] <FatPhil> this is nothing to do with non-ascii, it's to do with reserved characters: https://www.ietf.org
[09:59:01] <Bytram> Ugh. Muscle memory got the better of me and instead of closing a program, slipped and completely shutdown my computer.
[10:01:46] <Bytram> Do keep in mind, and I am not advocating for this viewpoint -- only mentioning that this seems to be the way things were implemented historically -- that things that are user-facing got much more attention than things that were staff-facing. So, there's a bunch of stuff that one just learns along the way that that is how stuff just works. I have nothing against documenting the issues found and submitting code fixes, but realize it may be a
[10:01:48] <Bytram> while before changes make it out to the live code running the system.
[10:02:41] <Bytram> anyway, I woke up way too early today, have the late shift tonight and the early shift tomorrow, so if I do not want to be death-warmed-over by the end of the day tomorrow, I needs to be getting myself back to bed.
[10:06:06] <Bytram> Given that it's still over an hour before sunrise, I may just have a chance of falling back to sleep.
[10:06:09] <Bytram> laters
[11:02:14] <FatPhil> today I learnt that <a href="http://foo.bar/x&y"> is invalid HTML (Well, validly encoded HTML containing an invalid
[11:02:17] <FatPhil> URL
[11:06:39] <TheMightyBuzzard> yurp. the URL spec is fucking annoying.
[11:08:16] <TheMightyBuzzard> specially since you can't count on anyone else following it correctly on the off chance that you do.
[11:08:40] <chromas> it's a string literal though. no need to encode the entities
[11:09:20] <chromas> plus urls aren't html
[11:09:32] <TheMightyBuzzard> it's an illegal for a url string literal, which is something we try to avoid.
[11:10:56] <chromas> yeah. there was a sub from the verge where they had & in their urls because they're retards and it looks like rehash did a s/&//g on them
[11:12:36] <TheMightyBuzzard> damn, we had to miss a sub from the verge? however shall we go on?!
[11:12:46] <chromas> no, it still got posted
[11:12:53] <TheMightyBuzzard> well shit
[11:12:55] <chromas> but the urls had to be fixed
[11:13:09] <FatPhil> in *HTML*, an & in a URL must be encoded as &
[11:13:29] <chromas> I've never seen that within an href
[11:13:41] <TheMightyBuzzard> wait, you mean the editors had to do extra work? that's working as designed then.
[11:13:42] <FatPhil> you've never looked at a valid HREF
[11:13:44] <chromas> (aside from the retards at the verge, of course)
[11:13:46] <chromas> only if you're putting it in as text
[11:14:08] <FatPhil> nope
[11:14:33] <TheMightyBuzzard> https://soylentnews.org
[11:15:12] <TheMightyBuzzard> the shitbirds actually MANDATED it in the spec
[11:15:21] <chromas> the spec is wrong then
[11:15:27] <TheMightyBuzzard> s'what i said
[11:16:01] <FatPhil> Buzz: that is not a sensible URL, however, it's what you would have in an HTML document
[11:16:40] <FatPhil> they didn't mandate use of '&' though - you can use ';' instead, and that's not reserved in HTML so doesn't need encoding
[11:16:43] <TheMightyBuzzard> sensible is contraindicated before 7am
[11:17:03] <chromas> ;amp; ?
[11:17:32] <FatPhil> https://soylentnews.org
[11:17:49] * TheMightyBuzzard is not rewriting apache to change it
[11:18:36] <FatPhil> https://soylentnews.org
[11:18:53] <TheMightyBuzzard> ya, apache's fine with it how it was
[11:18:58] <FatPhil> Those 2 URLs are both valid and identical in *meaning to an HTTP server*
[11:19:13] <FatPhil> THe former does not need encoding in HTML documents.
[11:19:22] <FatPhil> The latter does need encoding in HTML documents.
[11:19:38] <FatPhil> (to what Buzz posted)
[11:19:53] <TheMightyBuzzard> technically not but to follow the spec it does
[11:20:43] <TheMightyBuzzard> i have yet to meet a browser that can't follow an unencoded url like the latter correctly
[11:20:52] <TheMightyBuzzard> or even curl/wget
[11:21:01] <FatPhil> Technically not in HTML5 only.
[11:21:13] <FatPhil> Technically yes, needed, in all other HTMLs
[11:21:46] <FatPhil> That's why HTML5 got more lenient, as it was the typical behaviour.
[11:22:09] <TheMightyBuzzard> spec vs reality, yup.
[11:22:10] <FatPhil> However, it requires predicting every future HTML entity in order to be guaranteed to work for ever.
[11:22:39] <chromas> Or just scanning for the closing quote
[11:22:50] * TheMightyBuzzard predicts &socialjustice;
[11:22:51] <FatPhil> closing semi
[11:23:04] <TheMightyBuzzard> chromas, you mean the one someone left out?
[11:23:16] <FatPhil> I was gonna say something like &feminist_waving_dildo;
[11:23:53] <TheMightyBuzzard> well, as long as she washed it first.
[11:24:02] <chromas> I don't recall seeing any with a missing quotation mark
[11:24:10] <FatPhil> any whats?
[11:24:19] <chromas> ones
[11:24:23] <chromas> the thing tmb asked about
[11:24:26] <chromas> quotes I guess
[11:24:42] <FatPhil> nothing we've talked about has involved quotation marks.
[11:24:46] <TheMightyBuzzard> never ever ever code assuming users will only give proper input.
[11:25:00] <chromas> urls in html are wrapped in quotation marks
[11:25:03] <FatPhil> attributes require quotation marks, but nobody's talked about attributes.
[11:25:05] <TheMightyBuzzard> you oughta know that better than anybody
[11:25:19] <FatPhil> URLs are *values of* attributes.
[11:25:24] <TheMightyBuzzard> #roll 55555d10
[11:25:24] <MrPlow> chromas, is that you? stop being a wiseass.
[11:25:32] <FatPhil> URLs are found inside quotes.
[11:25:37] <chromas> yes
[11:26:00] <chromas> so when it finds an opening quotation mark, it should look for the closing one and not bother looking inside
[11:26:26] <chromas> beyond looking for escaped quotes I guess, if html can even take a \" in an attribute
[11:26:28] <FatPhil> the thing parsing attributes, yes, that's an option.
[11:26:40] <TheMightyBuzzard> we're a bit more complex than that in url processing in rehash
[11:26:51] <TheMightyBuzzard> very fucking complex and annoying in fact
[11:26:57] <chromas> yeah, it turns & in urls into nothing
[11:27:18] <FatPhil> that would be broken, I hope it's not true.
[11:27:31] <chromas> wasn't that what started the whole conversation?
[11:27:36] <FatPhil> yes
[11:27:41] <chromas> so it is true
[11:27:50] <chromas> your hopes are dashed
[11:28:08] <FatPhil> looks like upstart submits &, and slashcode rips them out
[11:28:18] <TheMightyBuzzard> my hopes don't dash. they don't even mosey a bit faster.
[11:28:19] <chromas> upstart left it the way it found it
[11:28:52] <TheMightyBuzzard> upstart use the html form or the api?
[11:28:53] <FatPhil> upstart looks like it's doing the right thing
[11:28:56] * chromas blames cmdrnacho
[11:29:03] <chromas> upstart uses api
[11:29:07] <FatPhil> irrelevant
[11:29:25] <chromas> well the api could do stuff to it differently
[11:29:26] <TheMightyBuzzard> is relevant. the api may be me hand rolling incorrectly
[11:29:31] <chromas> like it could be doing the eating
[11:29:38] <chromas> but I think you already tried it in a comment too
[11:30:04] <FatPhil> it *should* be irrelevant, if it's not irrelevant, things are more broken.
[11:30:30] <TheMightyBuzzard> FatPhil, never underestimate my ability to write a bug
[11:30:36] <chromas> btw tmb, do you know what kind of checks are done on a submission? I have a script that stuffs a submission form with bot output but it only works if the ip address of the browser matches that of the bot
[11:30:37] <FatPhil> How you receive an octet string should not change how you sanitise it.
[11:30:56] <TheMightyBuzzard> chromas, not a fucking clue offhand
[11:31:29] <TheMightyBuzzard> ya, reskeys are tied to your ip address
[11:31:39] <chromas> it borks even without a reskey
[11:31:42] <FatPhil> TheMightyBuzzard: when I used the web interface to put some &s into a URL to "fix" it, slash stripped it, so it's not the api side that's specifically the problem
[11:31:47] * TheMightyBuzzard shurgs
[11:31:55] <chromas> but testes show a form preview without a reskey causes rehash to generate one and fill it in
[11:31:59] <chromas> but not with my form
[11:32:10] <TheMightyBuzzard> good good, i might not be to blame then
[11:32:20] <chromas> rehash strips a lot of stuff though
[11:32:45] <TheMightyBuzzard> not on input, generally. most of the stripping is done on output.
[11:32:46] <chromas> and puts in lots of s when using the plain text input option
[11:33:11] <chromas> so the & should be stored in the db?
[11:33:36] <TheMightyBuzzard> possibly. i want a cigarette far more than i wanna check though.
[11:33:46] <TheMightyBuzzard> and i'm out of coffee
[11:33:49] <chromas> you're supposed to just know these things
[11:40:19] <TheMightyBuzzard> shit on that. there's at least three ways to input urls in a sub/comment for regular users.
[11:40:43] <chromas> you put in at least one of them
[11:40:52] <chromas> or one of you guys did
[11:40:53] <TheMightyBuzzard> did not. all three predate me.
[11:41:01] <TheMightyBuzzard> we've monkeyed with em though
[11:41:14] <chromas> doesn't it auto-wrap text urls with <a>?
[11:41:22] <TheMightyBuzzard> [url] plaintext, and <a>
[11:41:26] <chromas> I don't remember that happening before soybois
[11:42:27] <TheMightyBuzzard> i go for remembering program flow and where to find what i need to work on. remembering details is pointless when grep exists and you can just read the code.
[11:43:55] <TheMightyBuzzard> if i were to guess though, it's probably an incorrect attempt to prevent shat like &&rtl;; or some such
[11:52:33] <TheMightyBuzzard> okay, issue created. ima fuck off and play some vidya until it's light enough to go do construction work.
[12:41:47] <FatPhil> Is the "type" of the content (plain old text, html, ecode, etc.) stored in the db alongside the text?
[12:42:52] <janrinok> FP, I think that you might have missed him
[12:43:55] <FatPhil> One of the following makes sense: (1) on submission, the text gets canonicalised into HTML, and the original type gets discarded, no further processing is done; (2) no processing is done, the type is stored, and the content is processed in a type-specific way every time it's needed.
[12:44:31] <FatPhil> Anything that's part one, part the other, is asking for trouble. Fix once, and in a consistent stage in the pipeline.
[12:45:26] <FatPhil> I wrote my own templating engine, and *all* escaping is in the templating engine itself!
[12:47:42] <FatPhil> so ``<a href="<% url | attribute %>">Go to <% url | text %></a>'' always produces valid HTML (on the assumption the 'url' variable was a valid URL).
[12:51:09] <FatPhil> so if url was ``http://foo.bar/baz?x=yes&y=no'' , you'd get ``<a href="http://foo.bar/baz?x=yes&y=no">Go to http://foo.bar''
[13:57:16] <TheMightyBuzzard> FatPhil, that was the idea when i moved most a whole bunch of filtering to output instead of input a while back but i doubt i got it all.
[14:29:41] <TheMightyBuzzard> TMB's Law #12: It's always fucking something...