• 0 Posts
  • 216 Comments
Joined 5 years ago
cake
Cake day: October 2nd, 2020

help-circle


  • no, they steal everything.

    why do we keep letting them steal

    ‘free speech’ has always been about the freedom of the oppressed to fight upwards against their oppressor with language - but now they stole it & trying to make it mean their freedom to oppress minorities.

    same for ‘woke’ - it used to mean basic human decency, once again they stole it & warped it’s meaning by pretending they’re the victims and it’s preventing their freedom (ie. their freedom to be a bigot).

    same for ‘political correctness’, which was originally a criticism of using fake concern over moral issues for political agenda (sounds familiar), now warped beyond use.

    swastika - used for THOUSANDS of years before the fucking nazis came along & stole it. now the cultures it actually belongs to get hate for practicing their ancient beliefs.

    pepe and many others are a long list of things they steal and ruin.

    why do we keep letting them steal?








  • good points on the training order!

    i was mostly thinking of intentionally introduced stochastic processes during training, eg. quantisation noise which is pretty broadband when uncorrelated, and even correlated from real-world datasets will inevitably contain non-determinism, though some contraints re. language “rules” could possibly shape that in interesting ways for LLMs.

    and especially the use of stochastic functions for convergence & stochastic rounding in quantisation etc. not to mention intentionally introduced randomisation in training set augmentation. so i think for most purposes, and with few exceptions they are mathematically definable as stochastic processes.

    where that overlaps with true theoretical determinism certainly becomes fuzzy without an exact context. afaict most kernel backed random seeds on x86 since 2015 with the RDSEED instruction, will have an asynchronous thermal noise based NIST 800-90B approved entropy source within the silicon and a NIST 800-90C Non-deterministic Random Bit Generator (NRBG).

    on other more probable architectures (GPU/TPU) I think that is going to be alot rarer and from a cryptographic perspective hardware implementations of even stochastic rounding are going to be a deterministic circuit under the hood for a while yet.

    but given the combination of overwhelming complexity, trade secrets and classical high entropy sources, I think most serious attempts at formal proofs would have to resign to stochastic terms in their formulation for some time yet.

    there may be some very specific and non-general exceptions, and i do believe this is going to change in the future as both extremes (highly formal AI models, and non-deterministic hardware backed instructions) are further developed. and ofc overcoming the computational resource hurdles for training could lead to relaxing some of the current practical requirements for stochastic processes during training.

    this is ofc only afaict, i don’t work in LLM field.



  • ganymede@lemmy.mltoAsklemmy@lemmy.mlWhat is Lemmy's problem with AI?
    link
    fedilink
    arrow-up
    50
    arrow-down
    3
    ·
    edit-2
    12 days ago

    ignoring the hate-brigade, lemmy users are probably a bit more tech savvy on average.

    and i think many people who know how “AI” works under the hood are frustrated because, unlike most of it’s loud proponents, they have real-world understanding what it actually is.

    and they’re tired of being told they “don’t get it”, by people who actually don’t get it. but instead they’re the ones being drowned out by the hype train.

    and the thing fueling the hype train are dishonest greedy people, eager to over-extend the grift at the expense of responsible and well engineered “AI”.

    but, and this is the real crux of it, keeping the amazing true potential of “AI” technology in the hands of the rich & powerful. rather than using it to liberate society.



  • LLMs could be made deterministic

    Good reminder that LLM output could be made deterministic!

    Though correct me if I’m wrong, their training is, with few exceptions, very much going to be stochastic? Ofc it’s not an actual requirement, but under real world efficiency & resource constraints, it’s very very often going to be stochastic?

    Personally, I’m not sure I’d argue automation can’t be stochastic. But either way, OP asks a good question for us to ponder! The short answer imo: it depends what you mean by “automation” :)





  • yep, there’s this weird trend to demonise cute animals.

    you can’t even fucking mention koalas on reddit without some arsehole telling us they all have chlamydia every 53 seconds.

    according to them, all dolphins suck, all ducks are shit, and all cute little marsupials who never harmed a fly are secretly evil incarnate.

    what if all humans were judged by the actions of some humans? that’s a frying pan i’d rather not be in…



  • (ok i see, you’re using the term CPU colloquially to refer to the processor. i know you obviously know the difference & that’s what you meant - i just mention the distinction for others who may not be aware.)

    ultimately op may not require exact monitoring, since they compared it to standard system monitors etc, which are ofc approximate as well. so the tools as listed by Eager Eagle in this comment may be sufficient for the general use described by op?

    eg. these, screenshots looks pretty close to what i imagined op meant

    now onto your very cool idea of substantially improving the temporal resolution of measuring memory bandwidth…you’ve got me very interested with your idea :)

    my inital sense is counting completed L3/4 cache misses sourced from DRAM and similar events might be alot easier - though as you point out that will inevitably accumulate event counts within a given time interval rather than an individual event.

    i understand the role of parity bits in ECC memory, but i didn’t quite understand how & which ECC fields you would access, and how/where you would store those results with improved temporal resolution compared to event counts?

    would love to hear what your setup would look like? :) which ECC-specific masks would you monitor? where/how would you store/process such high resolution results without impacting the measurement itself?