#dev | Logs for 2018-12-05

« return
[11:05:39] -!- SoyGuest55531 has quit [Ping timeout: 248 seconds]
[11:11:31] -!- cosurgi [cosurgi!~cosurgi@pskvpf01.bl.pg.gda.pl] has joined #dev
[11:12:30] cosurgi is now known as SoyGuest19317
[15:10:11] <TheMightyBuzzard> fyngyrz, whether you have a handy dandy way in the language to check a word against 400 others or not, you still have to check it against each. less lines of code does not mean less work for the processor. the machine code is the same.
[15:11:08] <TheMightyBuzzard> i'm only slightly worried about that though. testing will show whether it causes an issue or not.
[15:12:36] <TheMightyBuzzard> i don't think i'll have it go through searching for acronyms though. i think i'll have it go through searching for <abbr> tags without a title. that way we can still use our own custom abbr tags and have them not get monkeyed with.
[15:14:12] <TheMightyBuzzard> and it'll need to go in the db. we don't do flat files because we have multiple web front ends. the only flat files are the ones that tell it what db to connect to.
[15:14:32] <TheMightyBuzzard> well and apache conf files and the like
[15:15:14] <TheMightyBuzzard> none of them get read at runtime though except the db connection file and it only gets read once ever per apache thread
[16:20:02] <fyngyrz> > whether you have a handy dandy way in the language to check a word against 400 others or not, you still have to check it against each. less lines of code does not mean less work for the processor. the machine code is the same.
[16:20:44] <fyngyrz> Nope. You know what a hash table is, right? Dictionaries go pretty much directly to the target without searching.
[16:21:06] <fyngyrz> they are way more efficient than "checking against all targets"
[16:22:22] <fyngyrz> well, if you use a db, an index will do a (less efficient) fast(er) lookup than just searching all the targets.
[16:23:00] <fyngyrz> Mine doesn't monkey with existing <abbr> tags either
[16:23:05] <fyngyrz> still does it in one pass, too.
[16:23:16] <fyngyrz> state machines FTW. :)
[16:28:12] <fyngyrz> You'd be better off with a replicated-across-sites flat file. Your DB performance is already excruciating.
[16:28:31] <fyngyrz> Your call, of course. Just an IMHO.
[16:31:31] <fyngyrz> BTW, both Perl and Python support dictionaries. Perl calls it a "hasH", though.
[16:31:38] <fyngyrz> "hash"
[16:32:54] * TheMightyBuzzard head desks
[16:33:37] <TheMightyBuzzard> there is no hash table instruction on a processor. you're comparing memory values. there's not a cheaper way to compare two values.
[16:37:32] <fyngyrz> yes, there is. It's called "an algorithm", my friend
[16:39:46] <fyngyrz> there's a world of difference between working through a list one my one until you have a match, which tends to be linear in terms of cost, as a smarter lookup, which trades a hash for an index (and for which there IS a machine code equivalent, of course)
[16:41:40] <fyngyrz> What you're saying is the same as saying that finding a number in an ordered list of numbers is just as slow by searching as it is with a binary search. It's not true, and for the same reason: the algorithm is way more efficient than the naive search.
[16:43:03] <chromas> It's more efficient (for large lists) but there's still a search
[16:43:29] <chromas> You just get to skip most of the list items
[16:43:34] <fyngyrz> just benchmark a comparison between a linear lookup and a hashed array lookup. You'll see the latter is much, much faster.
[16:44:25] <fyngyrz> pick random elements in the array, call and time the linear search, call and time the hash lookup. You'll find the hash is the high performer, hands down
[16:44:37] <fyngyrz> If you like, I'll write the benchmark for you.
[16:44:49] <fyngyrz> even in perl. It's easy enough.
[17:38:59] <fyngyrz> Here ya go: http://ourtimelines.com
[17:39:35] <fyngyrz> TL;DR: linear: 5 seconds. hash (dictionary): .03 seconds.
[17:39:53] <fyngyrz> I use MHZ because it's more or less in the middle of the acronym list
[17:41:33] <fyngyrz> please excuse my crappy perl. :)
[17:43:44] <fyngyrz> 5.55 seconds, actually
[17:44:20] <fyngyrz> let's see... 5.55 / .03 = ~185x faster
[17:44:53] <fyngyrz> so chromas, TMB... using a hash/dictionary is one hell of a lot faster
[17:45:38] <fyngyrz> that's in perl. I haven't actually benchmarked it in Python, but I would expect similar gains
[17:50:11] <fyngyrz> Of course, if I use something at the END of the list... then the benchmark goes even MORE in favor of the dict. But I figured the middleish was fair, because it's going to be the average lookup time, probably
[17:50:31] <fyngyrz> and likewise, if the linear lookup finds it very early, it'll be fast
[17:50:46] <fyngyrz> the middle tells the overall story, though
[17:54:40] <fyngyrz> Turns out IRS is in the middle of the list. Didn't make an appreciable difference in the benchmark, though
[17:55:54] <fyngyrz> 5.60 vs. .03
[17:56:43] <fyngyrz> timethis 100000: 6 wallclock secs ( 5.60 usr + 0.00 sys = 5.60 CPU) @ 17857.14/s (n=100000)
[17:56:43] <fyngyrz> Benchmark=ARRAY(0x8de1c60)timethis 100000: 0 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ 3333333.33/s (n=100000)
[17:56:43] <fyngyrz> (warning: too few iterations for a reliable count)
[18:07:12] <chromas> Where's the story scanner?
[18:07:25] <fyngyrz> eh?
[18:07:41] <fyngyrz> you mean the webapp to do all this?
[18:08:59] <fyngyrz> If that's what you mean, it's here: https://github.com
[22:13:16] SoyGuest19317 is now known as cosurgi
[22:13:20] -!- cosurgi has quit [Changing host]
[22:13:20] -!- cosurgi [cosurgi!~cosurgi@Soylent/Staff/Misc/cosurgi] has joined #dev
[22:38:43] <TheMightyBuzzard> fyngyrz, algorithm means "a way of doing something in software". it has nothing to do with what instructions a cpu can execute. the ones you're particularly interested in for this purpose is CMP. it compares the contents of two registers on the cpu and sets flags in a third register depending on the results. that is how every single comparison ever made on a PC is made. no matter how many lines your software language saves you of
[22:38:43] <TheMightyBuzzard> typing, it is done exactly the same way on the hardware level.
[22:40:04] <TheMightyBuzzard> so, no. you can not write a method of checking one string against four hundred that is any more efficient than doing it in a while loop. it is not physically possible.
[22:40:22] <TheMightyBuzzard> you can save yourself a lot of typing though.
[22:42:34] <chromas> A hash table can speed up the search by skipping over irrelevent items in the list but it still costs a lot of cycles
[22:42:48] <TheMightyBuzzard> no it can't
[22:42:54] <TheMightyBuzzard> how does it know if it is relevant?
[22:43:01] <TheMightyBuzzard> CMP
[22:43:15] <TheMightyBuzzard> on every item until it finds one that matches or runs out of items.
[22:43:41] <TheMightyBuzzard> there are sneaky ways to use more memory and speed that up a tad but that's what it boils down to.
[22:44:43] <TheMightyBuzzard> mind you, any given language that isn't very nearly bare metal is likely to introduce cocktastic inefficiencies. how those benchmark out is a crap shoot that you have to test to know.
[22:45:21] <chromas> If it wasn't faster then it wouldn't have a reason to exist
[22:45:29] <TheMightyBuzzard> java
[22:45:39] <chromas> Do you use java in your perl?
[22:45:47] <TheMightyBuzzard> no, in my stomach.
[22:46:11] <TheMightyBuzzard> java is not faster than anything except hiring people to look through filing cabinets and draw on a dry erase board.
[22:46:19] <chromas> Doesn't that make you sick? Or at least trigger diarrhea
[22:46:29] <TheMightyBuzzard> i like to call it regularity
[22:46:32] <chromas> javarrhea
[22:47:26] <TheMightyBuzzard> hashmaps and dictionaries and such are an abstraction of a commonly performed set of operations. ideally coded properly so you don't screw too many pooches trying to reinvent the wheel.
[22:49:14] <TheMightyBuzzard> in C, C++, rust, or assembly it's faster to use a loop than add the overhead. in higher level languages it entirely depends on how well they coded each bit of their language.
[22:51:22] * TheMightyBuzzard buggers off to get some dinner
[22:54:52] <TheMightyBuzzard> oooh, i know... ima go appropriate me a couple chimichangas worth of culture
[22:56:46] <chromas> Deep frying sounds like Murikan culture
[22:59:49] * chromas suspects tmb skipped algorithms class for fooshin'
[23:37:46] * TheMightyBuzzard has no need to suspect chromas skipped computer architecture class