[12:31:22] <janrinok> hi guys
[12:43:51] <takyon> hi
[12:44:36] <janrinok> hi takyon
[12:44:50] <janrinok> how's things?
[12:45:11] <janrinok> changing machines - back in 2 minutes
[12:52:41] <takyon> not too bad
[12:52:53] <takyon> maybe I should submit something
[12:53:08] <janrinok> I'm just posting the last few to the story queue.
[12:53:25] <janrinok> ..and working on by storybot
[12:53:57] <takyon> yeah does that harvest from #soylent or something
[12:54:58] <janrinok> it harvests both rss-bot for the latest stories, and logs for any that may have been missed since the last time it was run.
[12:55:36] <janrinok> shall I send you a sample to you SN email address?
[12:55:41] <takyon> sure
[13:01:01] <janrinok> on their way now
[13:01:53] <janrinok> they are html - so ready for editing or submitting - best viewed in a browser :0
[13:04:01] <janrinok> From Friday's log I have managed to auto-scrape over 200 stories. I have to produce a template for each site, but that only takes about 3 minutes each at present
[13:05:15] <janrinok> I've made no attempt at this point to select stories - simply solving the problems of scraping them. It is relatively trivial to do a word search on each one to decide what nexus clues to give e.g. gaming, security, mobile etc
[13:12:09] <janrinok> hi cmn32480
[13:12:22] <cmn32480> happy Father's Day janrinok
[13:12:47] <janrinok> and to you too! Did your kids remember - of course they did!
[13:13:04] <cmn32480> of course they did. they were all set to let me sleep in.
[13:13:26] <cmn32480> then the VP of Sales called because he can't remember his password. at 9am on a Sunday.
[13:13:50] <janrinok> One of my 'kids' gets married in a few months time, so I will forgive her being a bit too busy at present :)
[13:14:29] <janrinok> Did you charge the VP your usual call-out fee?
[13:14:40] <cmn32480> pardon my ignorance, does Father Day get celebrated there as well? or is it strictly an American thing?
[13:15:01] <janrinok> Its certainly celebrated in Europe too
[13:15:32] <janrinok> perhaps not quite as much as Mother's day - but we celebrate that on a different day from your side of the pond
[13:15:33] <cmn32480> there we go. I learned somethign new.
[13:15:44] <cmn32480> does that mean I get to go back to bed
[13:15:55] <janrinok> don't know about the antipodes
[13:16:07] <janrinok> sure, tonight
[13:16:39] <cmn32480> you are so good to me
[13:16:57] <janrinok> I'm just a big softy at heart
[13:17:05] <janrinok> brb - tea beckons!
[13:17:07] <cmn32480> obviously!
[13:17:15] <cmn32480> good idea. I'll make coffee
[13:17:22] <cmn32480> to balance the universe
[13:24:16] <janrinok> tea++
[13:24:16] <Bender> karma - tea: 7
[13:24:46] <cmn32480> coffee is brewing.
[13:24:58] <cmn32480> until it is complete the universe is out of balance
[13:25:07] <janrinok> I only made a cup - I bet you are making a pot
[13:25:18] <cmn32480> yes
[13:25:52] <janrinok> I can feel the earth shifting on its axis as we speak - even the days are going to get shorter...
[13:27:49] <cmn32480> lol
[13:29:43] <janrinok> you can mock - but mark my words, I'm correct :)
[13:30:39] <janrinok> my wife and I have just been sitting in the garden enjoying the longest day. Woken by birdsong at about 04:00.
[13:31:18] <cmn32480> sounds like a very nice way to spend a day
[13:32:16] <janrinok> well, it is. Unfortunately, she has a rather large dressing covering her nose after the surgery a few days ago. She is feeling a bit self concious about it.
[13:32:16] <cmn32480> somehow the phone call from the VP of Sales needing his password changed doesn't sound nearly as pleasant.
[13:32:47] <cmn32480> everything healing ok?
[13:34:02] <janrinok> too early to say - it is still very raw looking (but a healthy raw, if you know what I mean). When we get the results of the biopsy then we will know if anything more drastic is required.
[13:34:26] <cmn32480> gothca
[13:34:32] <cmn32480> *gotcha
[13:34:50] <janrinok> I didn't even see that the first one was wrong!
[13:35:09] <cmn32480> it took me a sec to see it too.
[13:35:18] <cmn32480> some editors we are! :-)
[13:37:08] <janrinok> so, is it steak and a beer for you tonight?
[13:37:15] <cmn32480> no idea
[13:37:35] <cmn32480> we are going to my parents house a little later
[13:37:36] <janrinok> or are the kid's making a 'surprise'?
[13:37:51] <cmn32480> so it'll be whatever my moterh overcooks
[13:38:34] <janrinok> don't let her hear you say that...
[13:38:58] <cmn32480> i typed it quietly
[13:42:53] <janrinok> I've just run a test of my storybot - from 24 hours of rss-bot logs, it has scraped 345 stories and still counting...
[13:43:26] <janrinok> If only 10% are any good then we are on a winner!
[13:43:44] <cmn32480> excellent
[13:44:04] <cmn32480> and at what point do we put this into use? 2 in the sub queue?
[13:45:10] <janrinok> I'm just working on the posting mechanism at present, then I will try it on dev. But I don't want to use it on prod because it will just a) flood the system and b) deter any other (better) submissions
[13:45:56] <janrinok> I would like to put them somewhere where we have access to them but they are not seen by the community. For use when the queue is empty so to speak.
[13:46:57] <janrinok> 455 stories and still going up in one 24 hour period
[13:48:27] <cmn32480> zoinks
[13:48:41] <cmn32480> and the universe is back in balance
[13:48:44] <cmn32480> coffee++
[13:48:44] <Bender> karma - coffee: 10
[13:49:01] <janrinok> cmn32480: check your SN email account for a sample of the output
[13:49:19] <cmn32480> I have to think abotu that for a second
[13:50:01] <janrinok> cmn32480 (at) soylentnews (dot) org
[13:50:15] <janrinok> webmail.soylentnews.org
[13:50:16] <cmn32480> I know THAT part.
[13:50:22] <cmn32480> that was the part I couldn't remember
[13:51:16] <janrinok> can I stop laughing yet?
[13:51:20] <cmn32480> now I have to go look up my password
[13:51:31] <janrinok> just started again...!
[13:52:46] <cmn32480> is the login the first part of the nick or the whole thing?
[13:53:26] <janrinok> nick + soylentnews (dot) org
[13:54:09] <cmn32480> and I recorded the pwd correctly this time
[13:54:19] <janrinok> you're a champ!
[13:54:30] <paulej72> 12345
[13:54:39] <cmn32480> damnit
[13:54:43] <cmn32480> now I have to change it again
[13:54:48] <cmn32480> and on my luggae too!
[13:54:48] <janrinok> you and me both
[13:56:09] <cmn32480> shit... I think the Arkansas one is looking for SpallsHurgenson!
[13:56:28] <janrinok> lol
[13:56:43] <janrinok> I haven't filtered them - they are straight from the feed
[13:57:52] <cmn32480> and you have hundreds more like this?
[13:58:07] <janrinok> about 500 sat in my directory
[13:59:06] <cmn32480> the third one screams rehashvertisement
[13:59:35] <cmn32480> that is excellent
[13:59:36] <janrinok> yep, but as I say, I am only testing the program at present
[14:00:22] <cmn32480> saves a lot of the BS for us creating the summary.
[14:00:50] <janrinok> well, it certainly eases the editing, especially of links etc.
[14:01:22] <janrinok> paulej72: have you got a minute for some advice
[14:01:34] * cmn32480 wonders who we will credit the subs to.... jrbot maght be a good name
[14:01:42] <janrinok> storybot
[14:01:50] <janrinok> is the name of the program
[14:02:02] <cmn32480> no good... makes too much sense
[14:02:05] <janrinok> I'll port it to perl once I have it up and running
[14:02:19] <cmn32480> you ARE a glutton for punsihment
[14:02:20] <janrinok> but I think better in Python currently
[14:03:03] <janrinok> It is only a small program - it just uses a few very clever libraries such as XML, XPath etc
[14:03:25] * cmn32480 hears a WHOOOOOOSHing noise
[14:03:51] <janrinok> simple version - they do all the clever stuff, I'm just gluing them together
[14:04:02] <cmn32480> see... there you go
[14:04:16] <cmn32480> remember, as programming goes, I can't find my ass with both hands and a flashlight
[14:04:35] <janrinok> you and n1 both then, apparently
[14:04:53] <cmn32480> yes, I believe that is correct.
[14:06:51] <janrinok> I hope to be able to also include a keyword search, so that it can suggest a likely nexus such as gaming, mobile, security or whatever.
[14:07:05] <cmn32480> excellent
[14:07:22] <cmn32480> I'm just impressed with the output.
[14:07:22] <janrinok> it will still need editing on our part - it can't tell good stories from bad
[14:07:45] <cmn32480> is it configurable to run against he last 1/6/12 hours?
[14:08:06] <cmn32480> or a specific rss feed?
[14:08:18] <janrinok> I reckon we can get at least 300 stories a day so there will be a large filtering task to choose the good ones from all the dross
[14:08:45] <cmn32480> hence my thought about running it for a smaller time frame
[14:08:49] <janrinok> but we will have a source of stories not more than 24hours old should the sub queue get a bit low
[14:09:18] <janrinok> The problem with the shorter time frame is that the good stories can come out at any time
[14:10:17] <janrinok> It can also read rss-bot live, so can give us a constant feed of current stories. I am open to suggestions on how to automate the filtering process, so any bright ideas will be welcome.
[14:10:56] <cmn32480> hmmmm.... perhaps not take ALL the feed from rss-bot
[14:11:35] <janrinok> well, in time we might find that some sources are better than others, but at the moment I haven't got enough data to make that call.
[14:11:40] <cmn32480> specific feeds, for the more technical sites
[14:12:01] <janrinok> they are all from tech sites - even the first story that I sent!
[14:12:50] <cmn32480> agreed, but certain places tend to be less technical (CNET), and some tend to be more technical (ScienceDaily, phys.org)
[14:13:34] <cmn32480> given the propensity of the community to send in politics and current events, perhaps using the bot to pull in from the moe technical sites might be a good idea down the road
[14:14:12] <janrinok> It would be easy to select or deselect specific feeds so that is easily do-able
[14:14:49] <janrinok> and I ignore feeds that currently rely on js - because I cannot easily parse the page that the url sends me to
[14:15:06] <janrinok> Forbes is currently one such site
[14:15:07] <cmn32480> what about a gui front end?
[14:15:22] <cmn32480> and will it make me coffee?
[14:15:48] <janrinok> could be done, but I'm hoping that eventually we will have a second sub page - viewable by editors only - so the editing task will be exactly the same as now
[14:15:54] <cmn32480> what about bank transfers 1 penny at a time into my account? nobody will notice.
[14:16:00] <takyon> happy father's day pops
[14:16:05] <janrinok> just looking for the python-coffee module...
[14:16:19] <cmn32480> same to you takyon
[14:17:40] <janrinok> takyon: can you back-read this page for the last 30 mins or so pse, I would welcome any comments and suggestions
[14:18:02] * cmn32480 wonders if the 2nd sub page would automatically delete things over 48 hours old.. to prevent overload
[14:18:27] <janrinok> exactly - my thoughts too
[14:18:34] <takyon> ok i'll check the output
[14:19:02] <cmn32480> side question: have we thought about at what rate hundreds of subs a day will do to the growth of the DB?
[14:19:19] <cmn32480> ^this is more a question for pj, NC, or TMB
[14:20:05] <cmn32480> sorry janrinok, I don't mean to be poking holes, it is in my nature
[14:20:12] <janrinok> that is not a problem to worry about at present. I don't think that disk space is an issue. But if we want to have nexuses, we need to keep them going at a reasonable rate - not the same as the front page but updating rather than growing stale
[14:21:41] <cmn32480> what is the planned rate for the story output in the nexuses (nexii?)
[14:22:11] <cmn32480> or is it moe of a "let's get tehm workign first, then see what we get?
[14:22:33] <cmn32480> dangit... must be a hole in the bottom of my coffee cup
[14:22:38] <cmn32480> brb
[14:22:52] <janrinok> well, there is no fixed rate. Stories get allocated when they arrive. But we should be able to add a story every day if possible. I would guess, that gaming should have at least 1 story a day from somewhere?
[14:24:07] <cmn32480> fair enough
[14:24:18] <cmn32480> with the constant stream for the front page
[14:25:26] <janrinok> the purpose behind each nexus is that people can filter those stories that they want from those that they don't. I don't think that there is any requirement for specific throughput. But I (personally) feel that we should be able to provide something on a reasonably regular basis.
[14:25:37] <janrinok> Evert
[14:25:52] <janrinok> everything hits the front page, but also can be in a nexus too
[14:26:34] <cmn32480> and depending on how many nexii there are, that may or may not be possible with our current rate of release.
[14:26:35] <janrinok> a bit like 'Breaking News' does now
[14:27:25] <janrinok> well, the plan is to have people - not necessarily existing eds - manage their favourite topics. They would have to be trained as eds but they only have access to their own nexus
[14:28:05] * janrinok understands that's how it should work - but there are still gaps in his knowledge....
[14:28:40] <cmn32480> that would mean separate story queues for each nexus, with separate sub ID's
[14:28:52] <janrinok> That's is why we have the nexus story queue problem at present
[14:29:08] <cmn32480> right
[14:29:21] <janrinok> if you edit a meta story you end up only seeing the meta story queue - it moves you to that nexus
[14:29:26] <cmn32480> yes
[14:29:42] <cmn32480> and to get out, you ahve to go back to the main site adn start over
[14:29:47] <janrinok> while that is the desired eventual outcome it doesn't work for us yet
[14:30:22] <janrinok> paulej72 is introducing another entry in the admin menu so that we can select all stories or a specific nexus
[14:31:31] <takyon> ok so
[14:31:54] <takyon> can we get some summly in this mix
[14:31:57] <takyon> time to find out
[14:32:42] <takyon> looks like a huge no
[14:33:24] <janrinok> takyon: ? didn't understand that?
[14:33:48] <takyon> what is it supposed to do?
[14:34:00] <janrinok> what is what supposed to do?
[14:34:06] <takyon> rss bot
[14:34:06] <cmn32480> the storybot, I think
[14:34:07] <takyon> the ouput
[14:34:32] <cmn32480> rss-bot just dumps the rss feeds from a binch of places ninto an IRC channel
[14:34:35] <janrinok> we have story feeds that are being collected by rss-bot and are there for anyone to use
[14:34:59] <janrinok> the program scrapes the links that it provides and extracts the 'story'
[14:35:30] <takyon> is it supposed to make a submission directly from this?
[14:35:35] <janrinok> it is intended to help supplement the submissions when the queue is running low. However, it can provide up to 500 stories a day before filtering.
[14:36:02] <janrinok> the filtering task is still manual - unless you know of a piece of software that 'understands' text
[14:36:22] <takyon> what programming language is this in
[14:37:06] <janrinok> it will make submissions automatically but, if we use it like that it will simply flood the queue and deter others from submitting. So I am hoping to have a 'backup submissions list' for when (as now) things are running low
[14:37:29] <janrinok> it is in python, but I can port it to perl once I have it up and running.
[14:38:02] <takyon> ok well I have some code that basically looks at the text of an article and scores it with a "tag" based on key words
[14:38:23] <janrinok> that is exactly what I am looking for - language?
[14:38:26] <takyon> this could be adapted to automatically pick "Techonomics" "Digital Liberty" "Career and Education"
[14:38:30] <takyon> javascript
[14:38:49] <janrinok> ah, ok. I'd have to rewrite that but it can be done
[14:38:59] <takyon> now I'm thinking about filtering it down
[14:39:23] <takyon> we could have inclusion and exclusion rules......
[14:39:38] <janrinok> I'm using python because that is what I work in at present. And will port to perl so that it can become part of the site's software.
[14:39:40] <takyon> if it scores high enough on some keyword search metric related to the tag thing, it gets included
[14:40:11] <takyon> and if it hits certain very specific keywords about stories we run all the time, it gets included
[14:40:57] <janrinok> I'm working on the same idea, but I'm only at an early stage. My next task is the auto submission to dev.
[14:41:02] <takyon> I will look at providing you with a big block of python/pseudocode
[14:41:12] <janrinok> that would be great - thanks
[14:41:31] <takyon> which will define some keyword arrays, use a variable holding the article text, and return a boolean true/false and a tag
[14:42:27] <takyon> adding keywords to the arrays is very arbitrary, but with some testing and tweaking it can probably be made to pick the "correct" tag 90% of the time
[14:43:25] <takyon> some keywords might be duplicated on purpose
[14:43:33] <takyon> for example "RIAA"
[14:43:45] <takyon> might give +1 point to both Techonomics and Digital Liberty
[14:43:50] <janrinok> yep, that makes sense
[14:44:55] <takyon> I'll also think about a Nexus suggestion at the same time as tagging. since that might be even more important
[14:45:05] <takyon> It would be easy enough to detect a "gaming" story
[14:45:56] <janrinok> yes, but I don't know the keywords for gaming - I have never been involved in it. All the software I have written that dropped bombs or fired weapons did the real thing!
[14:56:50] <takyon> these are literally the only keywords I have for gaming
[14:56:56] <takyon> "console","game","game dev","gaming","handheld","pc gam","playstation","video game","wii","xbox"
[14:57:37] <takyon> later
[15:25:13] <janrinok> takyon: that'll do to start with.
[15:26:08] <janrinok> *** a 2nd ed on the next stories would be appreciated pse ***
[16:30:38] <cmn32480> done janrinok|afk
