#editorial | Logs for 2019-11-19
« return
[10:02:34] <chromas> Trying to steal the auto citation feature from StoryBot but it seems like most sites don't put DOIs in their pages
[10:04:58] <janrinok> there is no auto citation - I have to add that manually at present. I am looking at ways to do it but, as you have seen, there are a lot of variations to cover and I don't have a working version yet.
[10:05:10] <chromas> ah
[10:06:39] <janrinok> I have also found a bug in the code that we added the other day. The title field can be one of 2 types, a byte string if it only contains pure ASCII, or a unicode string if it contains unusual characters and punctuation. I'll send you the modifications that I am currently testing once I am happy with them.
[10:07:57] <chromas> Is that what the decode() does?
[10:08:04] <chromas> I guess in python it's just decode
[10:08:21] <janrinok> I am adding a web browser editor - similar to the facility that you already have - that will help in identifying DOI and citation fields
[10:09:29] <janrinok> Yes, but you can only use it on one of the possible strings - so I have to test the type and then select one of 2 paths for printing it out correctly. I'm just running it for a day or two so that I don't keep having to revisit the same problem
[10:10:20] <chromas> Is there a way to just make it always a unicode string?
[10:10:35] <chromas> especially since pure ascii is considered a subset of unicode
[10:10:44] <janrinok> my storybot runs every 4 hours in auto-collection mode - I'm just checking the processing logs to make sure it isn't bombing out elsewhere
[10:12:35] <janrinok> as you know, all web pages are encoded, but the data downloaded is simply a series of bytes. If I change the processing of the title field it could have adverse effects on the code elsewhere. I'll look at that possible problem when I have a bit more time
[10:14:35] <janrinok> python 3 uses unicode all the time, but input and output have to be in bytes. As there are various different paths and formats required for logs, disk files, submissions, and raw source data I have to make sure that any I/O is correctly handled.
[10:16:36] <janrinok> added to that, various C libraries that are used (e.g. libxml) only understand bytes so I have processing that returns formats that I would rather not have had.
[10:18:03] <janrinok> 'print' can print either without problem providing it knows which way you want it to be treated - hence the decode and encode functions on I/O
[10:20:05] <janrinok> It sounds way more complicated than it actually is but until the whole world accepts unicode as the default (and some insist on using ASCII because they don't understand the rest of the world has different alphabets!) then the problem will remain.
[10:46:23] -!- ClownHatLinux has quit [Remote host closed the connection]
[10:51:58] <FatPhil> chromas: there's no such (uniquely defined) thing as a "unicode string". And the set of all ASCII strings certainly isn't a subset of the set of all unicode strings.
[10:52:24] <chromas> utf8 string then
[10:52:51] <FatPhil> if you mean "utf-8" specifically, then please don't call it "unicode".
[10:53:30] <chromas> utf-8 is the only valid encoding. everything else is wrong :)
[10:53:48] <FatPhil> WinNT is wrong? no argument there.
[10:54:08] <chromas> for example, Windows uses utf-16. You're not implying Windows is right, are you? Why won't you deny it
[10:54:44] <FatPhil> Windows was based on UCS2, not utf-16
[10:55:12] <FatPhil> utf-16 didn't exist at the time
[10:56:45] <FatPhil> UCS2 strings are a subset of valid utf-16 strings, though, they attempted to have backward compatibility with themselves and thei
[10:57:36] <chromas> your message got truncated. must've been using utf-16
[10:57:58] <janrinok> unless you look at the 'encoding' for web pages that claim they are Windows 1252 or ISO 8859-1
[10:59:36] * janrinok has noticed web pages that claim to be one specific encoding but are actually something completely different. He's had great fun trying to sort those out...
[11:00:00] <chromas> Sounds like hack/frauds to me
[11:00:01] <janrinok> chromas made me chuckle...
[11:00:06] <chromas> String 'em up with the others
[11:02:05] <chromas> I think I broke him
[11:02:28] * chromas pokes FatPhil with a stick
[11:02:47] * chromas sets a Coors Light next to his head
[11:10:47] <chromas> doi--
[11:10:47] <Bender> karma - doi: -1
[11:13:09] <FatPhil> was having lunch in a cellar cafe - lost all coverage!
[11:15:00] <FatPhil> The funny thing about jumping from UCS2 to utf-16 is that you lose the only property that was useful in UCS2 - direct character indexing.
[11:15:47] <FatPhil> well done MS - drop ASCII compatibility in order to maintain trivial indexing, and then throw away trivial indexing!
[11:18:18] -!- ClownHatLinux [ClownHatLinux!~systemd@0::1] has joined #editorial
[11:18:38] <chromas> =doi 10.1126/science.aax5798
[11:18:40] <ClownHatLinux> <p><b>Intermediate bosonic metallic state in the superconductor-insulator transition</b> [$], <cite>Science</cite> (DOI: <a href="10.1126/science.aax5798">10.1126/science.aax5798</a></p>
[11:19:20] <chromas> new toy
[11:23:16] <chromas> looks like I dun goofed on the url though
[11:24:25] -!- ClownHatLinux has quit [Remote host closed the connection]
[11:24:38] -!- ClownHatLinux [ClownHatLinux!~systemd@0::1] has joined #editorial
[13:39:11] <Bytram> Hi guys!
[13:39:21] <Bytram> Clarifications, if I may?
[13:39:49] <Bytram> UTF-8 is short for "Unicode Transfer Format - 8 bit"
[13:40:01] <Bytram> UTF-16 is short for "Unicode Transfer Format - 16 bit"
[13:41:03] <Bytram> Those are protocols for transferring (encoding/decoding) Unicode code points ('characters')
[13:41:52] <Bytram> And, AFAICR, all of ASCII is encoded in the first code page of Unicode.
[13:43:35] <Bytram> well, the first 127 code points, at least.
[13:50:26] <Bytram> https://en.wikipedia.org
[13:50:28] <ClownHatLinux> ^ 03ASCII - Wikipedia ( https://en.wikipedia.org )
[13:50:28] <exec> └─ 13400 Bad Request
[13:50:30] <exec> └─ 13ASCII - Wikipedia
[13:50:49] <Bytram> Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats,
[13:50:49] <Bytram> called UTF-8, UTF-16 and UTF-32).
[13:50:49] <Bytram> ASCII was incorporated into the Unicode (1991) character set as the first 128 symbols, so the 7-bit ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with 7-bit ASCII, as a UTF-8 file containing only ASCII characters is identical to an ASCII file containing the same sequence of characters. Even more importantly, forward compatibility is ensured as software that recognizes only 7-bit
[13:50:52] <Bytram> ASCII characters as special and does not alter bytes with the highest bit set (as is often done to support 8-bit ASCII extensions such as ISO-8859-1) will preserve UTF-8 data unchanged.
[13:51:01] <Bytram> .
[13:51:27] <Bytram> That said, it is helpful to gain some context by starting slightly earlier on that page, here: https://en.wikipedia.org
[13:51:28] <ClownHatLinux> ^ 03ASCII - Wikipedia ( https://en.wikipedia.org )
[13:51:28] <exec> └─ 13400 Bad Request
[13:51:30] <exec> └─ 13ASCII - Wikipedia
[13:57:41] <Bytram> =submit from the jokes-write-themselves dept. (Be sure to replace ScienceDaily link with link under "Materials") https://www.sciencedaily.com
[13:57:42] <ClownHatLinux> Submitting "New, slippery toilet coating provides cleaner flushing, saves water: Innovative coating could reduce toilet water consumption by half, increase water sustainability"...
[13:57:43] <exec> └─ 13New, slippery toilet coating provides cleaner flushing, saves water: Innovative coating could reduce toilet water consumption by half, increase water sustainability -- ScienceDaily
[13:58:03] <ClownHatLinux> ✓ Sub-ccess! "New, Slippery Toilet Coating Provides Cleaner Flushing, Saves Water: Innovative Coating Could Reduce" (21 paragraphs) -> https://soylentnews.org
[13:58:05] <exec> └─ 13New, Slippery Toilet Coating Provides Cleaner Flushing, Saves Water: Innovative Coating Could Reduce: SoylentNews Submission
[15:36:50] <Bytram> whereto? https://crt.sh
[15:36:52] <ClownHatLinux> ^ 03crt.sh | %%soylentnews.org
[15:36:53] <exec> └─ 13crt.sh | %%soylentnews.org
[15:38:25] <Bytram> grrr...
[15:39:03] <Bytram> seems the comment processing code refuses to let me create an actual link using: <a href="https://crt.sh/?q=%%25soylentnews.org">https://crt.sh/?q=%%25soylentnews.org</a>
[15:39:04] <ClownHatLinux> ^ 03crt.sh | %%soylentnews.org"https://crt.sh/?q=%%soylentnews.org/a
[15:39:06] <exec> ├─ 13crt.sh | %%soylentnews.org">
[15:39:06] <exec> └─ 13crt.sh | %%soylentnews.org</a>
[15:39:17] <Bytram> it just appears as plain text on the preview page. :/
[15:39:27] <Bytram> ditto for: <a href="https://crt.sh/?q=%%25sylnt.us">https://crt.sh/?q=%%25sylnt.us</a>
[15:39:28] <ClownHatLinux> ^ 03crt.sh | %%sylnt.us"https://crt.sh/?q=%%sylnt.us/a
[15:39:29] <exec> ├─ 13crt.sh | %%sylnt.us">
[15:39:29] <exec> └─ 13crt.sh | %%sylnt.us</a>
[15:39:29] <exec> └─ 13crt.sh | ERROR!
[15:39:38] <exec> └─ 13crt.sh | ERROR!
[15:40:50] * Bytram tries to auto-link it
[15:40:51] <Bytram> <URL:https://crt.sh/?q=%%25soylentnews.org>
[15:40:53] <exec> └─ 13crt.sh | %%soylentnews.org>
[15:41:36] <Bytram> is not displayed, either. =(
[15:47:52] <Bytram> .
[15:48:00] <Bytram> whereto? https://go.theregister.co.uk
[15:48:01] <ClownHatLinux> ^ 03Second time lucky: Sweden drops Julian Assange rape investigation ( https://www.theregister.co.uk )
[15:48:02] <exec> └─ 13Second time lucky: Sweden drops Julian Assange rape investigation • The Register
[15:48:02] <exec> └─ 13Second time lucky: Sweden drops Julian Assange rape investigation • The Register
[16:07:30] <Bytram> whereto? https://phys.org
[16:07:31] <ClownHatLinux> ^ 03Huge tsunami hit Oman 1,000 years ago
[16:07:32] <exec> └─ 13Huge tsunami hit Oman 1,000 years ago
[17:33:21] <Bytram> ~arthur https://www.theregister.co.uk
[17:33:24] <exec> └─ 13Intel end-of-lifing BIOS and driver downloads for dusty hardware • The Register
[17:33:24] <exec> 561 stories loaded
[17:33:24] <exec> attempting to submit story: "Intel end-of-lifing BIOS and driver downloads for dusty hardware"
[17:33:54] <exec> submission successful - https://soylentnews.org
[17:34:00] <exec> 560 stories loaded
[17:58:15] <Bytram> ~arthur https://www.bbc.co.uk
[17:58:17] <exec> 560 stories loaded
[17:58:18] <exec> attempting to submit story: "TSB lacked common sense before IT meltdown, says report"
[17:58:20] <exec> └─ 13TSB lacked common sense before IT meltdown, says report - BBC News
[17:58:48] <exec> submission successful - https://soylentnews.org
[17:58:53] <exec> 559 stories loaded
[19:01:36] <chromas> When do we get utf-64? Or will we go straight to 128 like IPv6?
[19:05:45] <Bytram> after they come up with too many more emojis?
[19:05:59] <janrinok> ~gday Bytram
[19:06:00] * exec sneakily injects a lode of init scripts into Bytram
[19:06:07] <Bytram> chromas: and... I think you may be confused.
[19:06:11] <janrinok> ~gday chromas again
[19:06:12] * exec seriously deallocates a floccinaucinihilipilificator of monkeys from chromas
[19:06:41] <chromas> It's never been more confused
[19:06:51] <chromas> Or less, maybe. I dunno
[19:06:54] * janrinok has had a busy day - but has just pushed 5 stories to the queue
[19:07:11] <Bytram> the -8, -16, -32 indicate how many bits are in a 'unit', so UTF-8 is actually optimal as it allows for more 'pages' of glyphs / code points to be easily added
[19:07:23] <Bytram> ~gday janrinok
[19:07:24] * exec definitely synthesizes a blerg of gasoline from janrinok
[19:07:36] <Bytram> janrinok++ muchos gracias!
[19:07:36] <Bender> karma - janrinok: 76
[19:07:50] <chromas> ~g'day janrinok
[19:07:51] * exec problematically redirects a hashtable of deeplearn into janrinok
[19:07:52] <janrinok> is that a full blerg or a partially filled blerg? By the way, what the f*** is a blerg?
[19:08:00] <Bytram> in case you are now getting hungry... buenos nachos!
[19:08:12] <Bytram> =g blerg
[19:08:13] <ClownHatLinux> https://www.urbandictionary.com - blerg - Urban Dictionary
[19:08:15] <exec> └─ 13Urban Dictionary: blerg
[19:08:23] <chromas> ~g-day Bytram
[19:08:24] <janrinok> just had homemade spicy ribs for dinner, but thanks for the offer
[19:08:29] <Bytram> ~gday chromas
[19:08:30] * exec transphobically pipes a rusty trombone of steaks into chromas
[19:09:00] <Bytram> where do you get the seeds to make home-made ribs?
[19:09:46] <chromas> I think you just pull one out of some guy while he's sleeping and plant it
[19:10:00] * Bytram has been catching up on SpaceX's Starship prototype construction activity: https://forum.nasaspaceflight.com
[19:10:01] <janrinok> from pigs
[19:10:02] <exec> └─ 13SpaceX Starship : Texas Prototype(s) Thread 2 : Photos and Updates
[19:10:05] <chromas> I read about it in a documentary
[19:10:15] <janrinok> but it is a rather complicated process .....
[19:10:23] <Bytram> there's a genesis of a story there or something
[19:10:41] <chromas> That's right; starring Phil Collins
[19:10:43] <janrinok> not for general tv audience viewing
[19:10:57] <janrinok> ba-dum tish
[19:11:02] <Bytram> trying to drum up support for an alternate view point, eh?
[19:11:33] <janrinok> and talking of alternative viewpoints - that API story went down well .....
[19:11:50] * chromas waits with gated breath
[19:11:54] <janrinok> 1 man versus the community
[19:11:59] * Bytram hasn't followed it
[19:12:25] <janrinok> I hope his Mum still loves him, he didn't get a warm reception from the gang
[19:12:40] <Bytram> while you are both here, thanks so much for picking up the slack while I'm on the mend!
[19:12:47] <Bytram> oops, gtg... laundry
[19:12:48] <Bytram> brb
[19:12:55] <chromas> Did a wild Oracle rep appear?
[19:13:47] <janrinok> you see what he did there chromas ? He threw us aside to concentrate on his laundry! I'm shocked, shocked I tell you...
[19:14:40] <chromas> #MeToo
[19:14:52] <chromas> Hopefully he'll come back feeling detergent about it
[19:15:02] <janrinok> or at least a bit washed out
[19:15:09] <chromas> oh I see, in the api story, the author was wrong
[19:15:16] <janrinok> he'd better clean up his act
[19:15:45] <janrinok> he fought valiantly - but I don't think he won...
[19:16:32] <chromas> oh wait, after current events, he might come back feeling wishy washy; perhaps even agitated
[19:17:17] <janrinok> or all in a spin
[19:19:15] <Bytram> trying to drum up support for alternate views
[19:19:41] <janrinok> you had better backread before you continue
[19:19:43] <chromas> What a load!
[19:19:46] <Bytram> ofc the laundry repair person chose to show up today and had to restart my wash
[19:20:12] <Bytram> waiting for godot^W laundry
[19:20:18] <janrinok> I can restart my own machine thank you - I don't need a specialist to do it. Or did you need both hands?
[19:20:22] <chromas> oh, move mylast pun downa line
[19:21:02] <chromas> Maybe it's an IBM washer so it needs a thousand-dollar contractor just to put the quarters in
[19:21:22] * Bytram tried to make a clean escape
[19:21:39] * janrinok thinks Bytram is feeling much better
[19:21:42] <Bytram> s/escape/^\[/
[19:21:44] <exec> <Bytram> tried to make a clean ^[
[19:22:03] <janrinok> er, is that supposed to be better?
[19:22:14] <chromas> Your humor is dry and static-free
[19:22:38] <Bytram> is how one entered an ESC char to set certain control modes on a genuine VT-100
[19:23:00] <janrinok> you're showing your age again
[19:23:04] <Bytram> also for ANSII(?) color codes for terminals
[19:23:55] <janrinok> how's Mr Left Pinky?
[19:24:12] <Bytram> still attached and no signs of eloping
[19:24:30] <janrinok> as long as it is still there that is half the battle won
[19:24:51] <Bytram> yep, doesn't seem to be going anywhere
[19:25:34] <janrinok> has he still declared independence or is he slowly getting along with his neighbours?
[19:25:46] <Bytram> gonna have to ask the missus
[19:25:51] <janrinok> lol
[19:26:08] * Bytram decides to see if the laundry is now ready to move to the dryer
[19:26:32] <Bytram> though my forearms are still sore, my typing is getting a bit better (thank goodness)
[19:26:39] <Bytram> afk biab
[19:26:42] <janrinok> I think we covered dryer puns along with the washing
[19:31:38] <chromas> I think you've made even dryer puns in the past
[19:32:08] <janrinok> v good!
[19:34:23] <janrinok> gtg - my busy day is not yet over. See you guys tomorrow maybe!
[19:35:38] <chromas> he's trying to \[ before Bytram gets back
[19:36:10] * janrinok thinks his plan has been foiled
[19:38:19] <Bytram> lol
[19:39:15] <Bytram> well, they were in the midst of debugging the electronic control panel, so the price had not been set correctly... saved $1.25
[19:50:42] <chromas> that api story is the perfect place for comments to support the <img> tag
[19:51:36] <chromas> for one of these memes https://i.kym-cdn.com
[19:52:02] <chromas> some of the text is missing though
[20:32:53] <Bytram> http://feedproxy.google.com
[20:32:54] <ClownHatLinux> ^ 03For T-Mobile's John Legere, all those F-bombs had a purpose ( https://www.cnet.com )
[20:32:56] <exec> └─ 13John Legere brought T-Mobile back from the dead. Now he's moving on - CNET
[20:32:58] <exec> └─ 13John Legere brought T-Mobile back from the dead. Now he's moving on - CNET
[20:36:23] <chromas> F-Mobile
[20:43:15] <Bytram> sprintobile? Sprobile? T-Int?
[20:43:40] <Bytram> Sprintmobile
[20:43:47] <Bytram> break time
[22:29:27] <chromas> =doi https://iopscience.iop.org
[22:29:29] <ClownHatLinux> <p><b>Micrometeoroid Events in LISA Pathfinder - IOPscience</b>, <cite>The Astrophysical Journal</cite> (DOI: <a href="https://doi.org/"></a></p>
[22:29:30] <exec> └─ 13Micrometeoroid Events in LISA Pathfinder - IOPscience
[22:29:31] <exec> └─ 13Page Not Found
[22:29:46] <chromas> hm, broken again. fpos
[22:37:33] -!- ClownHatLinux has quit [Remote host closed the connection]
[22:37:54] -!- ClownHatLinux [ClownHatLinux!~systemd@0::1] has joined #editorial
[23:00:48] -!- ClownHatLinux has quit [Remote host closed the connection]
[23:00:59] -!- ClownHatLinux [ClownHatLinux!~systemd@0::1] has joined #editorial
[23:50:04] <Bytram> chromas: I'm playing with your soysub web page and noticed it no longer seems to provide a story-link-with-title, nor does it bracket the collected story in blockquotes...
[23:50:10] <Bytram> (1) Launch: https://chromas.0x.no
[23:50:38] <Bytram> (2) Pasted this in the "Links & Text" field: https://arstechnica.com
[23:50:40] <exec> └─ 13Bonkers pricing of “free” flu shots shows what’s wrong with US healthcare | Ars Technica
[23:50:41] <ClownHatLinux> ^ 03Bonkers pricing of “free” flu shots shows what’s wrong with US healthcare ( https://arstechnica.com )
[23:50:42] <exec> └─ 13Bonkers pricing of “free” flu shots shows what’s wrong with US healthcare | Ars Technica
[23:51:01] <Bytram> (3) Clicked on the "Start" button.
[23:51:42] <Bytram> (4) Scrolled down to the "Article text:" box.
[23:52:09] <Bytram> (5) Saw only story text w/o link, title, or blockquote tags
[23:52:26] <chromas> I see it too. Thought I fixed that
[23:52:34] * chromas updates the extractor tool
[23:52:49] <Bytram> was wondering if it was just me... thanks a bunch for looking into it!
[23:53:02] <Bytram> Is a *really* useful tool!!
[23:55:06] <Bytram> https://www.nirsoft.net
[23:55:07] <ClownHatLinux> ^ 03Your external IP address is 71.85.26.178
[23:55:08] <exec> └─ 13Your external IP address is 23.24.97.65
[23:55:12] <Bytram> LOL!
[23:57:20] <Bytram> chromas: pls let me know when you think you have it fixed so I can test it out for you.
[23:58:49] <chromas> Looks like I changed the way the extractor takes input for some reason that I don't remember
[23:59:23] <chromas> or I forgot what I was doing
[23:59:27] * chromas shurgs