#soylent | Logs for 2025-07-18
« return
[01:00:12] -!- bender has quit [Remote host closed the connection]
[01:00:21] -!- bender [bender!bot@Soylent/Bot/Bender] has joined #soylent
[01:01:45] -!- Loggie [Loggie!Loggie@Soylent/BotArmy] has joined #soylent
[04:06:07] <chromas> https://www.youtube.com
[04:06:10] <systemd> ^ 03WKUK - GUN DEBATE
[05:00:49] -!- c0lo [c0lo!~c0lo@120.158.tz.gjv] has joined #soylent
[05:01:53] <c0lo> Top talent https://www.youtube.com
[05:01:56] <systemd> ^ 03She is scamming Bill collector
[06:22:10] <janrinok> good morning world!
[06:40:57] <c0lo> A good day from some new 500s, right?
[06:41:42] <janrinok> none so far - but it is still early :)
[06:43:42] <janrinok> I am recording them. kolie and I had a discussion about it yesterday. kolie is not seeing them at the times that he is logged in. so it is difficult for him to track down the cause. I can tell him when they occurred but there are no clues as to why they are happening.
[06:43:46] <c0lo> About 48h for some TZ in this world to still be in 2025/07/18, so I reckon there's still plenty of time.
[06:44:56] <c0lo> > but there are no clues as to why they are happening.
[06:44:56] <c0lo> Insufficient logging.
[06:46:21] <janrinok> Basically, it appears that the Perl code might be the problem. As time passes we are having to build the system with up-to-date versions of Apache etc. It is possible that what was once a good design is now not really meeting the needs of a modern server. So we are looking at rewriting but in a structure manner.
[06:48:06] <janrinok> The first area to look at is the templating as it is being used with every almost every query/request to the site. The data has to be formatted. But there are now more efficient template packages (which don't care what language feeds into them) that we might be able to use.
[06:50:26] <janrinok> Although the user base is relatively small, we are having to cope with scrapers, spiders and assorted bots. They are all hitting the site and many of them a requesting data that they want in the form that we release it - i.e. templated.
[06:52:25] <janrinok> We are frequently seeing a hundred or more requests each minute. Blocking the IPs also takes processing time and if not done carefully will cause collateral damage in the form of preventing our members from having a reliable connection too.
[07:04:34] <janrinok> What should we be logging? If we do not know the cause where should we be looking? What information should be recorded? What should we filter the logs on to find the one bit that we actually need?
[07:06:09] <janrinok> If the cause is occurring in the Perl code how do we insert the logging functionality? That seems to require a rewrite of the Perl code, a lot of testing, and possible just moving the problem to somewhere else.
[07:18:37] <chromas> Log everything
[07:19:11] <chromas> Also I thought the entire purpose of Varnish was to cache the dynamic pages so they don't have to be generated all the time. That's like the one and only thing it's for
[07:28:01] <janrinok> Yeah, but the scrapers want the whole site but they are often guessing for what URL they want. The result is that varnish would have to hold the entire site and URL lookups become less useful.
[07:33:02] <janrinok> How do we log what Perl is doing without putting hooks into the Perl, which requires all the actions associated with changing the code.
[07:41:36] <chromas> put in a bunch of print statements ;)
[07:41:43] <chromas> https://www.youtube.com
[07:41:45] <systemd> ^ 03Back to the Future - UNHINGED VERSION
[09:41:51] -!- inz [inz!~inz@wbi.fi] has joined #soylent
[13:28:05] <janrinok> Has anyone seen any 500s today?
[13:42:21] <c0lo> no
[13:45:21] <janrinok> the server also seems to be under much less of a load, but we are still subjected to scrapers, spiders and bots. So perhaps they are not the cause of the problem. I wonder if kolie has changed something overnight?
[13:47:43] <janrinok> Ari/harkenon on #staff implied that he had seen them before 0900z today, but I have not seen anything since.
[13:49:14] <inz> ClaudeBot has made 3823 request to my website / git repo today
[13:49:34] <Fnord666> set up a tarpit if you can.
[13:50:34] <janrinok> Thanks inz. I'm not sure that is a way to go. It would require more cpu work to feed the dummy data than to just ignore them
[13:50:35] <Fnord666> https://www.reddit.com
[13:50:37] <systemd> ^ 03Reddit - The heart of the internet
[13:51:00] <janrinok> Sorry, the second part of my last comment was for FNord666.
[13:51:34] <janrinok> We have had thousands of requests, but the server has just handled them.
[13:51:52] <inz> Fnord, everything is just static files, so it doesn't really bite me
[13:52:06] <inz> If my bad code makes claude at least a little worse, all the better
[13:54:52] <janrinok> I agree with the sentiment, but I would like to piss them off without expending too much computer power to achieve it. If the system is showing 500s because it is overloaded (and this is still not certain) then using more CPU time feeding them crap isn't going to help. Static files would be fine, but I thought that a tarpit had to be dynamic to be credible? I could be wrong - I often am!!
[13:55:41] <janrinok> Just returning a 429 and nothing else would send them a similar message that they are not welcome here.
[14:13:48] <Fnord666> Good point. Yes, a tarpit designed to feed AI scrapers crap in order to help poison their model does take some compute.
[14:15:27] <Fnord666> I think they use a small local model to generate plausible but garbage web pages.
[14:16:44] <fab23> or use Anubis, as I have mention a few days ago -> https://soylentnews.org
[14:16:45] <systemd> ^ 03SoylentNews Comments | AI is Scraping the Web, but the Web is Fighting Back ( https://soylentnews.org )
[14:18:33] <Fnord666> Yep. That was the one I was trying to remember. There's also Nepenthes https://zadzmo.org
[14:18:38] <systemd> ^ 03ZADZMO code
[14:19:31] <Fnord666> That one does not use AI to generate the pages though.
[14:19:35] <fab23> Anubis is also used on bugs.freebsd.org and cgit.freebsd.org
[14:20:31] <Fnord666> nepenthes I meant." YOU ARE LIKELY TO EXPERIENCE SIGNIFICANT CONTINUOUS CPU LOAD, ESPECIALLY WITH THE MARKOV MODULE ENABLED."
[14:20:44] <fab23> Anubis does give the client a challange to solve, and implementation in e.g. Apache looks very nice, see https://anubis.techaro.lol
[14:20:45] <Fnord666> Cool re Anubis
[14:20:50] <systemd> ^ 03Apache | Anubis
[14:25:46] <janrinok> Fnord666, I've just been reading about iocaine and it looks useful. I would need to see the User-Agents in addition to the info displayed elsewhere on a staff only channel.
[14:27:54] <janrinok> I would be happy to provide a server locally if the site wanted to redirect stuff to a iocaine installation. (I've got fibre now - ADSL is gone! I'm living in a new world as far as the internet is concerned)
[14:28:40] <janrinok> The mayor of the local village complained to Orange that they had run a cable almost through the village but not connected us. That has now been rectified.
[14:30:00] <janrinok> My recently installed photovoltaic panels are providing enough power for my computers over any 24 hour period.
[14:41:32] <fab23> we should put some clouds close by :)
[14:43:21] <janrinok> er, no thank you!
[14:43:40] <fab23> but people want to store their data in the clouds :)
[14:44:04] <janrinok> We've got loads of clouds - in fact rain is promised for the weekend....
[14:44:41] <janrinok> Why people need nuclear power to support clouds is beyond me. We get them for free!
[15:09:48] <Fnord666> Lol!
[15:09:58] <Fnord666> Welcome to the 21st century janrinok
[15:13:40] <kolie> lol get off of eachothers lawns
[15:13:59] <Fnord666> Exactly!
[15:14:24] <kolie> I'm probably the youngest regular here by a good margin
[15:14:48] <kolie> I started my net adventure very very early though.
[15:43:38] <chromas> tarpitting itself doesn't really cause a load on the server. you just need to slow down the rate you return bytes
[15:57:26] <kolie> Slow them down, hot cache the contents in ram so no processing, and yea its basically no load.
[15:57:34] <kolie> depends how its done ofc.
[15:58:19] <chromas> send a few random bytes then stream a pre-built junk file; one byte, one second of sleep(), one byte...
[17:31:51] -!- ted-ious has quit [Ping timeout: 252 seconds]
[19:31:03] -!- ted-ious [ted-ious!tedious@ted.ious] has joined #soylent
[20:15:40] -!- Runaway1956 [Runaway1956!~OldGuy@the.abyss.stares.back] has joined #soylent