#dev | Logs for 2018-07-03
« return
[09:04:49] <SoyGuest52256> Buzz: what do you think of this:
[09:04:55] <SoyGuest52256> --- a/Slash/Utility/Data/Data.pm
[09:04:55] <SoyGuest52256> +++ b/Slash/Utility/Data/Data.pm
[09:04:55] <SoyGuest52256> @@ -2835,7 +2835,7 @@ sub url2html {
[09:04:55] <SoyGuest52256>
[09:04:55] <SoyGuest52256> my $scheme_regex = _get_scheme_regex();
[09:04:57] <SoyGuest52256> # Had to be neutered as $URI::uric is not remotely utf8-safe and ther is no re
[09:05:00] <SoyGuest52256> - $text =~ s#(?<!\S)((?:$scheme_regex):/{0,2}\S+)#
[09:05:03] <SoyGuest52256> + $text =~ s#(?<!\S)((?:$scheme_regex):/{0,2}\S+[^.;,])#
[09:05:05] <SoyGuest52256> qq[<a href="$1" rel="url2html-$$">$1</a>];
[09:05:08] <SoyGuest52256> #ogie;
[09:05:36] <SoyGuest52256> as a fix for this: https://soylentnews.org
[10:25:26] <Bytram> I see no problems with the links appearing in our story as compared to the cited story -- all look to match up exactly? Which link seems to be in error? (i.e. what do we have and what should it be?)
[10:26:53] <SoyGuest52256> You have <a href="https://en.wikipedia.org/wiki/Nigerian_Dwarf_goat." title="wikipedia.org">https://en.wikipedia.org/wiki/Nigerian_Dwarf_goat.</a>
[10:27:00] <SoyGuest52256> You should have <a href="https://en.wikipedia.org/wiki/Nigerian_Dwarf_goat." title="wikipedia.org">https://en.wikipedia.org/wiki/Nigerian_Dwarf_goat</a>
[10:27:07] <SoyGuest52256> Trailing '.' breaks it.
[10:28:04] <Bytram> Ummm, are we looking at the same story? I see no wiki links in: https://soylentnews.org
[10:32:50] <Bytram> SoyGuest52256: Oh, you mean the link in this *comment* ?? https://soylentnews.org
[10:35:49] <Bytram> It looks lkike the URL *could* be inserted correctly, per: https://soylentnews.org
[10:37:39] <Bytram> So, I take that to mean that the comment submitter may have fat-fingered the submission? Need more details as to what was entered into the comment form and what came out to know what really happened. Did they Use the "[url " meta-construct (never tried it, going from memory there) or did they try to enter the URL directly (which is what I always do) ??
[11:03:22] <SoyGuest52256> no, they simply typed something like ``look at http://foo.bar.'', and SN turned that into ``look at <a href="http://foo.bar.">http://foo.bar.</a>'' rather than ``look at <a href="http://foo.bar">http://foo.bar</a>.
[11:04:32] <SoyGuest52256> Yeah, the link is in https://soylentnews.org , which was the post that Arik was responding too.
[11:05:12] <Bytram> yeah, now that I read *all* the comments in for the story, I figured it out. Until then, I thought it was in reference to a link that had appeared in the *story*
[11:05:13] <SoyGuest52256> My URL entered in PLAINTEXT mode
[11:05:14] <Bytram> and...
[11:05:25] <SoyGuest52256> my URL *wasn't* ....
[11:05:30] <Bytram> a trailing period in a URL is valid AFAIK, GIGO
[11:05:39] <SoyGuest52256> I typed the <a> myself.
[11:05:57] <Bytram> then, in that case, don't do *that*!
[11:06:03] <SoyGuest52256> Not GIGO - this is *automatic* behaviour.
[11:06:21] <Bytram> huh?
[11:06:27] <SoyGuest52256> Automatic things are supposed to help, not making things worse
[11:06:45] <Bytram> what, ***exactly*** did you type in?
[11:07:05] <SoyGuest52256> i typed in the explicit html, in html mode
[11:07:21] <Bytram> please put the **exact** text **you** entered here.
[11:07:25] <SoyGuest52256> gpp typed in the url in some other mode
[11:07:40] <SoyGuest52256> What I entered is *utterly irrelevant*.
[11:07:57] <SoyGuest52256> My URL worked as I went to the effort of hand-crafting the HTML.
[11:08:04] <Bytram> I thought you were the one who entered the 'bad' url... no?
[11:08:34] <SoyGuest52256> No, Arik reported the bug evidenced in the post he's responding to.
[11:08:38] <SoyGuest52256> I responded to Arikl
[11:09:01] <Bytram> I couldn't tell from your nick over here
[11:09:18] <Bytram> so let's start over.
[11:09:21] <Bytram> I am lost.
[11:09:32] <Bytram> I see there is some problem with a URL
[11:09:46] <Bytram> there is one that appeared with a trailing perriod
[11:09:49] <Bytram> right so far?
[11:10:45] <SoyGuest52256> richtopia's #701573 demonstrates the bug in SN's over-aggressive automatic linkifying.
[11:12:00] <Bytram> I need details to understand this, not references or paraphrases... is way early and I'm still half alseep and need to be at work in under 2 hours.
[11:12:31] <Bytram> i am trying to help, but I am confused, so *I* need help
[11:13:07] <Bytram> like, an exact example of text entered into a comment with a valid URL that gets muncged into an invalid one?
[11:13:15] <Bytram> *munged
[11:13:29] <SoyGuest52256> The bug is that in trying to be friendly and useful, SN automatically creates URLs which are not what the poster wants, expects, or intends, and therefore SN has done A Bad Thing(TM).
[11:14:08] <Bytram> okay, I understand that *concept*.... I'm not seeing it yet. Need a specific instance.
[11:14:13] <Bytram> I typed:
[11:14:32] <Bytram> <a href="http://example.com/">foo</a> and got...
[11:14:36] <Bytram> ^^^ something like that
[11:21:48] <Bytram> If they entered something with the "[url" thingy, that's one thing.
[11:22:26] <Bytram> if they entered: <a href="http://example.com/foo.bar.">stuff</a>
[11:22:38] <Bytram> then as far as I know we try to give 'em what they asked for.
[11:23:03] <SoyGuest52256> But you *are* seeing it. #701573 is it.
[11:23:08] <Bytram> Now, I have seen some issues when there are "5" symbols in a URL, but have not seen enough cases or made the time to track down what is going on
[11:23:19] <SoyGuest52256> I just created a more explicit explanation at #701842
[11:23:26] <Bytram> linky please?
[11:23:48] <SoyGuest52256> https://soylentnews.org
[11:23:54] <Bytram> thank you!
[11:24:20] <SoyGuest52256> there is no "[url" stuff, this ain't BB.
[11:24:38] <Bytram> Okay, that example helps very much.
[11:24:49] <Bytram> and I disagree that it is a bug.
[11:25:08] <Bytram> they put in a period we give 'em a period. GIGO
[11:25:20] <SoyGuest52256> Fuck the user. Great attitude.
[11:25:23] <Bytram> a trailing period is valid URL
[11:25:29] <Bytram> no, not that at all...
[11:25:37] <SoyGuest52256> Yes, completely that.
[11:25:41] <Bytram> law of least astonishment...
[11:25:51] <SoyGuest52256> E
[11:25:54] <SoyGuest52256> XACTLY
[11:25:58] <Bytram> What I put in, is what I get out.
[11:26:06] <SoyGuest52256> NOOOOO!!!!!!!!!!!11
[11:26:07] <Bytram> preview, check link (
[11:26:12] <Bytram> huh?
[11:26:21] <Bytram> oh! Doh! I put in an extra period I did not mean to
[11:26:24] <Bytram> my bad.
[11:26:32] <SoyGuest52256> People don't preview.
[11:26:55] <Bytram> I find no software more frustrating than one that keeps trying to fix what I put in, and does it *wrong*
[11:27:05] <Bytram> autocorrect on my cellphone, for example
[11:27:24] <SoyGuest52256> We, by doing the automatic linkifying, are explicitly *not* returning what they enter. We're mangling it.
[11:28:34] <Bytram> It may *seem* to be a simple thing in this one case, but I have seen what happens when multiple simple things get built on top of each other and things start going sideways ina very confusing way...
[11:28:44] <Bytram> wait
[11:29:18] <SoyGuest52256> Yes, but the *sole* reason for this feature is to make things easier for the human by turning what they type (a URL) into what they want
[11:29:20] <Bytram> "We, by doing the automatic linkifying, are explicitly *not* returning what they enter." ?? They entered it, we give it to them? We *are* giving them *exactly* what they gave us.
[11:29:27] <SoyGuest52256> (an anchor)
[11:29:42] <SoyGuest52256> BOllocks we are - we're mangling it.
[11:30:03] <SoyGuest52256> We're adding the <a href=... before it, and the </a> after it.
[11:30:06] <Bytram> I may want a trailing period. it's valid HTML, and a valid URL.
[11:30:21] <Bytram> If we keep stripping it off, how are they supposed to put it in?
[11:30:33] <Bytram> Now someone else is getting pissed off because we are being "helpful"
[11:31:04] <SoyGuest52256> So your attitude is "fuck the 99.9999% of the users who don't want a trailing period, we've go to correctly handle the one-in-a-million case"?
[11:31:12] <SoyGuest52256> That's no better an attitude.
[11:31:32] <Bytram> no, and I do NOT want to start an argument.
[11:31:53] <SoyGuest52256> If you want the unimaginably-rare case, then enter the <a href explicitly yourself, it's easy.
[11:32:06] <SoyGuest52256> But you have started an argument. You took a contrary stance.
[11:32:38] <Bytram> I am not feeling well, my manager is in the hospital with a very risky pregnancy, we are short staffed at work, with a new person on board (1st week) who needs a lot of hand-holding to get up to speed, and I have not slept well for a week. So please give this discussion that context and I apologize if I came off argumentative.
[11:33:36] <SoyGuest52256> Ack
[11:33:54] <Bytram> And, to be quite frank, you took a stance and asserted it to be true. I can see how you may see it that way. You have semantics. I
[11:34:02] <Bytram> sorry, hit enter instead of delete
[11:34:14] <Bytram> and I need to be at work in 90 minutes
[11:35:11] <Bytram> umm, I'm not saying don't do it, I'm just pointing out that it does not seem to me to be as clear-cut as what I am reading.
[11:35:22] <Bytram> I'll definitely give it some more thought!
[11:36:47] <SoyGuest52256> I admit that I am asserting with little evidence that to the majority of users, my stance represents The Right Thing(tm).
[11:36:56] <Bytram> But, my instinctive response, based on over 3 decades of doing software test/QA is that there are gotchas in edge cases, never mind multi-orthogonal corner cases. those are *nasty*, so I tend towards doing the last possible that still makes it clear what is happening. I should not have to reverse engineer an input parser to be able to get out what I want.
[11:37:06] <Bytram> thank you.
[11:37:23] <Bytram> we're both trying to Make Things Better(tm)
[11:37:27] <Bytram> =)
[11:37:29] <SoyGuest52256> However, richtopia would I'm sure claim it is The Right Thing(tm), and that Arik would agree with him.
[11:37:45] <Bytram> atm, we see different parts of the same elephant.
[11:37:50] <SoyGuest52256> And I agree enough to be willing to dive into the code and provide a fix.
[11:38:49] <Bytram> question? is there a particular reason why you don't use your SN nick, here> Is *very* confusing for me to know who I am talking to... only just *now* figured it out.
[11:38:51] <SoyGuest52256> I would also assert that "wanting a URL ending in a dot" is the edge case.
[11:39:04] <Bytram> s/>/?
[11:39:29] <SoyGuest52256> Occasionally I disconnect, and I get pseudonymised again, so I gave up authenticating.
[11:39:47] <Bytram> I'll grant it is not common to want a period at the end, yes.
[11:41:41] <Bytram> I dunno why, but given vectors of values, my instinctive reaction is to permute across all possible values and find out where the corners are.
[11:41:55] <Bytram> and, thank you very much for digging through the code and coming up with a fix.
[11:42:22] <Bytram> Is much better than saying bad code, go fix. Very much appreciate your willingness to MakeItBetter
[11:42:37] <Bytram> In the end, it is not up to me...
[11:43:00] <Bytram> I'm as much playing devil's advocate, and doing a spec review in my mind as to what could go wrong...
[11:43:35] <Bytram> if I can be satisfied that the benefits outweigh the risks/problems then I'll be for it, but I need to understand what is going on before I can get there.
[11:44:31] <Bytram> And, IME, URLS are snarky nasty bolgoraths... especially when one tosses in IDN with UTF-8
[11:45:05] <Bytram> so, I really appreciate your dedication and making the time to clarify the issue!
[11:45:17] <Bytram> sadly, i really really need to start to get ready for work.
[11:45:32] <Bytram> thanks for the chat!
[12:15:55] <SoyGuest52256> Working out what a human meant is very very very hard for a computer to get right, so you'll always expect a few things to go wrong.
[15:31:44] <TheMightyBuzzard> SoyGuest52256, about the patch: it might work. would need a good hard look and then some bytraming though.
[15:32:01] <TheMightyBuzzard> the rest of that wall o text, tl;dr. summarize?
[15:38:00] <SoyGuest52256> tricky...
[15:41:11] <SoyGuest52256> How about "Assertion: The inclusion of trailing punctuation in an automatically linkified URL will almost always be not in the actual URL that the poster meant, and therefore automatic linkification should cater for the majority case by stripping that trailing punction character."
[15:41:20] <TheMightyBuzzard> as for the legit urls ending in a dot, i have no problem with just pasted urls ending in dots getting fucked up. if they want it done exactly right they can bloody well use an <a> tag.
[15:42:08] <SoyGuest52256> TheMightyBuzzard: total absolute agreement about the "just do it manually if you've got a corner case not handled correctly by the automatic mangler".
[15:43:38] <SoyGuest52256> The only question is what punctuation should be included in the "not likely to be a final character in a url" class. '.' and ',' I feel would be quite common. I included ';' only because I use semicolons in prose quite often.
[15:43:49] <TheMightyBuzzard> besides which, who the hell wants to look at a url? it's wasted space when you could use an <a> tag even in plain old text mode.
[15:44:15] <SoyGuest52256> Yeah, but this is automatic linkification, it's for the lazy.
[15:44:22] <TheMightyBuzzard> nod nod
[15:45:02] <TheMightyBuzzard> all punctuation likely to be in a sentence, i'd say. ,.;:'"
[15:45:08] <TheMightyBuzzard> oh and !
[15:45:16] <TheMightyBuzzard> possibly ? too
[16:05:01] <SoyGuest52256> ? could begin a list of zero CGI parameters, server might behave differently because of that.
[16:06:45] <SoyGuest52256> But it's easy to fine-tune the regexp now we know where it is in the code if it's not quite right.
[16:12:22] <TheMightyBuzzard> any server that doesn't default to a list of zero cgi parameters for a get request is a piece of shit i'm not inclined to cater to.
[16:23:15] <SoyGuest52256> I wouldn't be surprised if it was quite common, an empty list and a lack of a list are different things in most languages.
[16:26:52] <SoyGuest52256> OK, the framework I use does indeed just give you an empty hash of parameters when none are specified.
[16:28:09] <SoyGuest52256> I guess it's user-friendlier to not distinguish the two cases.
[16:29:20] <TheMightyBuzzard> so does every server software that i've ever touched.
[16:29:25] <SoyGuest52256> But in the write-it-all-from-scratch bad old days, I would often generate static content containing the form when there was no ? and only call the CGI handler if there were apparent parameters.
[16:30:10] <SoyGuest52256> So I agree with your inclusion, and retract my objection.