#editorial | Logs for 2019-06-09

« return
[00:40:03] <Bytram> chromas: Those were *explicitly* put there for a reason.
[00:40:05] <Bytram> #smake chromas
[00:40:05] * MrPlow smakes chromas upside the head with testicles of evil
[00:48:28] -!- Fnord666_ [Fnord666_!~Fnord666@925-300-434-342.ubr6.dyn.lebanon-oh.fuse.net] has joined #editorial
[00:48:53] <Fnord666_> Evening Bytram
[00:50:01] <Bytram> good evening!
[01:07:50] <Bytram> http://feedproxy.google.com
[01:07:52] <upstart> ^ 03Little-known meteor shower this month could have dangerous stowaways ( https://www.cnet.com )
[02:09:10] <Bytram> had a full day today and a busy one ahead for tomorrow
[02:09:34] <Bytram> think I'm going to try and turn in at a reasonable hour tonight.
[02:11:28] <Bytram> =submit https://www.washingtonpost.com
[02:11:29] <upstart> Submitting "A TV meteorologist objected to management’s ‘code red’ orders in on-air apology. He might be out of a job."...
[02:11:51] <upstart> ✓ Sub-ccess! "A TV Meteorologist Objected to Management’s ‘code Red’ Orders in On-air Apology. He Might be Out of a Job." -> https://soylentnews.org
[02:13:08] <Bytram> =submit I would think that the "old growth" forests would be in the less-accessible, higher elevations (think in the mountains) so that might have something to do with it, too. Right? https://phys.org
[02:13:11] <upstart> Submitting "Older forests resist change"...
[02:13:33] <upstart> ✓ Sub-ccess! "Older Forests Resist Change" -> https://soylentnews.org
[02:15:47] <Bytram> =submit NB: LHCb is the Large Hadron Collider beauty experiment (at http: // lhcb-public.web.cern.ch /) https://www.sciencedaily.com
[02:15:48] <upstart> Submitting "CERN’s LHCb experiment reports observation of exotic pentaquark particles"...
[02:16:10] <upstart> ✓ Sub-ccess! "CERN’s LHCb Experiment Reports Observation of Exotic Pentaquark Particles" -> https://soylentnews.org
[02:18:06] <Bytram> =submit https://www.atlasobscura.com
[02:18:08] <upstart> Submitting "To Evade Pre-Prohibition Drinking Laws, New Yorkers Created the World's Worst Sandwich"...
[02:18:30] <upstart> ✓ Sub-ccess! "To Evade Pre-Prohibition Drinking Laws, New Yorkers Created the World's Worst Sandwich" -> https://soylentnews.org
[02:21:55] <Bytram> =submit https://www.cnet.com
[02:21:57] <upstart> Submitting "Little-known meteor shower this month could have dangerous stowaways"...
[02:22:18] <upstart> ✓ Sub-ccess! "Little-known Meteor Shower This Month Could Have Dangerous Stowaways" -> https://soylentnews.org
[02:22:57] <Bytram> okay, that was the last of the tabs that I had open in consideration for submitting here.
[02:23:13] <Bytram> and now I really will bid you adieu!
[06:21:18] -!- Fnord666_ has quit [Quit: This computer has gone to sleep]
[07:00:20] -!- CoolHand has quit [Remote host closed the connection]
[08:43:52] <janrinok> <chromas> I found another site where the bot repeats stuff for no reason. This time, it's the entire article multiple times
[08:45:32] <janrinok> chromas, I've been having this problem with my bot doubling parts of a story for over 18 months now. I still cannot find a reliable way of stopping it. Why some sites insist on repeating part of their article is beyond me.
[08:46:41] <janrinok> It is not php that has a problem, I use Python 3 and it behaves the same way.
[08:49:32] <janrinok> ...but if you solve it, please let me know how!
[08:55:33] <chromas> In my case, the php's a web front-end to my scraper, which is written in D. I'm thinking that the article dupes are coming from a bug in my code because it's all written from scratch :)
[08:55:46] <chromas> Well, I didn't write the stdlib or the compilter
[08:55:52] <chromas> s/t//4
[08:57:23] <janrinok> Well as I get exactly the same thing from a program written by myself in Python 3, I suspect it isn't a coincidence. 'Something' in the html makes my bot throw a wobbly and duplicate text.
[08:57:27] <chromas> The aticles where my bot was duplicating stuff, it wasn't that way in the actual source given by the server
[08:57:42] <chromas> Weird. I wonder if it does it to the same aricles
[08:58:19] <chromas> The "t" for "articles" was already used up in "compiler" I guess
[08:58:35] <janrinok> I don't think I've kept a record of which sites it happens on, I had always assumed that it was my code that was at fault. Your experience suggests that it might be otherwise.
[09:01:20] <janrinok> As I am not currently running my bot very often - I only tend to use it when the queue starts looking very low - I haven't had a case of duplication recently. And I'm currently busy looking after my SO to spend time looking at it at present.
[09:02:01] <janrinok> I'll let you know if/when I see it again with the relevant links etc
[09:02:07] <chromas> If you do get time, you can try out Scientific American. I seem to get dupes on all the links
[09:02:24] <janrinok> Is that in SNs rss feeds or not?
[09:02:26] <chromas> For example, chromas.0x.no/feedctl/xtractor.php?feedctlid=30l2xcal05og4swco80ggcgkc8sc40kgc
[09:02:45] <chromas> Original link https://blogs.scientificamerican.com
[09:02:46] <upstart> ^ 03Deep-Space Shielding
[09:02:55] <chromas> (first link is bot output)
[09:03:14] <janrinok> Oh, I've never seen 4 repeats before. You are beating my hands down..... :)
[09:03:25] <janrinok> *me
[09:03:58] * chromas has the best bugs
[09:05:30] <janrinok> that page makes my bot throw a fit. I'll have to look at it another time
[09:06:02] -!- CoolHand [CoolHand!~CoolHand@Soylent/Staff/Editor/CoolHand] has joined #editorial
[09:06:02] -!- mode/#editorial [+v CoolHand] by Hephaestus
[09:07:34] <janrinok> My bot simply pulls out the title and encoding, then strips everything except for <p>...</p> I dump any tags not included inside paragraphs, and those that are inside are processed to strip out tracking and other crap from the urls
[09:08:37] <janrinok> Finally, certain sites (phys.org etc) are given their own processing to remove internal links, unusual formatting tags and other similar junk.
[09:12:17] <chromas> You might've fixed it, but a long, long time ago there was a version of arthur (maybe exec's copy; not sure) that doesn't spruce up relative links
[09:12:25] <chromas> I think CNET was a site that used them a lot
[09:12:43] <chromas> so they'd end up pointing to soylentnews.org/tags/whatevertopic
[09:13:08] <janrinok> Yeah, I had a regression about 8 months or so ago. I've fixed it on my machine but there are other users of Arthur that might not have updated yet.
[09:13:36] <chromas> Let Microsoft show you the way: Force automatic updates
[09:14:03] <janrinok> And I'm not even sure that I have updated all 8 of my desktops either. Been a little bit busy and preoccupied
[09:14:11] <janrinok> lol
[09:43:15] <janrinok> gtg
[10:43:39] <chromas> Heh, I just noticed the two maccosex zsh subs by two different bots are by the same soycow
[11:02:55] <Bytram> coffee++
[11:02:55] <Bender> karma - coffee: 96
[12:02:56] * Bytram deletes the shorter submission from the sub queue
[12:03:11] * Bytram gets ready to head out for the day
[12:03:13] <Bytram> afk
[12:03:16] <Bytram> laters!
[15:45:56] -!- Fnord666_ [Fnord666_!~Fnord666@925-300-434-342.ubr6.dyn.lebanon-oh.fuse.net] has joined #editorial
[16:22:12] -!- Fnord666__ [Fnord666__!~Fnord666@925-300-434-342.ubr6.dyn.lebanon-oh.fuse.net] has joined #editorial
[16:24:35] -!- Fnord666_ has quit [Ping timeout: 248 seconds]
[17:30:28] -!- Sirfinkus has quit [Quit: Textual IRC Client: www.textualapp.com]
[20:00:23] -!- Sirfinkus [Sirfinkus!~SirFinkus@u-32-30-676-744.hsd5.wa.comcast.net] has joined #editorial
[20:34:07] -!- Fnord666__ has quit [Quit: Leaving]