@8uurg

8uurg@lemmy.world · 2 days ago

The differences between surviving, living and thriving can be pretty big.

8uurg@lemmy.world · 10 days ago

Saying AI = LLMs is an severe oversimplification though. LLMs and image generators are subsets of AI that are currently most prominent and with which is most commonly knowingly being interacted with, but pretty much every formal definition is wider than that. Recommendation algorithms, as used on YouTube or social media, the smart (photo) search, are further examples of AI that people interact with. And fraud detection, learning spam filters, abnormality (failure) detection, traffic estimation are even more examples. All of these things are formally defined as AI and are very much commonplace, I would not call them niche.

The fact that LLMs and image generators are currently the most prominent examples does not necessarily exclude other examples from being part of the group too.

Using AI as a catch all phrase is simply a case of overgeneralization, in part due to the need of brevity. For some cases the difference does not matter, or is even beneficial. For example, ‘don’t train AI models on my art’ would only marginally affect applications other than image generation and image analysis, and covers any potential future applications that may pop up.

However, statements ‘ban AI’ could be easily misconstrued, and may be interpreted in a much wider manner than what the original author may have intended. There will be people with a variety of definitions to what does or does not constitute AI, which will lead to miscommunication unless it is clear from context.

It probably wouldn’t hurt clarifying things specifically and talking about the impact of a specific application, rather than discussing what is (or is not) to be classified as AI.

8uurg@lemmy.world · 13 days ago

Rate per distance is not that great of a metric either, though. Increasing distance does not necessarily increase risk equally. A car that drives a long stretch on a highway is unlikely to hit a pedestrian, but inside a city, or on a shared country road this becomes much more likely. Distance travelled would be inflated in this case for the car, and the metrics would end up being much lower. Furthermore, because walking is generally done for short distances, any incident would inflate this rate much more for pedestrians.

You preferably want to have some measure of risk for a single trip. If a trip were to be made by another mode of transport, would it still have occurred? A proxy for this can be the severity: How high is the chance that an incident is fatal there between two modes of transport, given that an incident occurs? You may also wish to account for the likelihood of an interaction. Which also provides another means of improving: what infrastructure was involved? Disentangling two modes of transport makes them less likely to interact.

Sorry for this long rant, but I really dislike rate / distance as a means of normalizing a metric that is meant to indicate the relative safety.

8uurg@lemmy.world · 16 days ago

At least the EU is somewhat privacy friendly here (excluding the Google tie in) compared to whatever data sharing and privacy mess the UK has obligated people to do with sharing ID pictures or selfies.

Proving you are 18+ through zero knowledge proof (i.e. other party gets no more information than being 18+) where the proof is generated on your own device locally based on a government signed date of birth (government only issues an ID, doesn’t see what you do exactly) is probably the least privacy intrusive way to do this, barring not checking anything at all.

8uurg@lemmy.world · 18 days ago

It is complicated. It is not technically always, but in practice is may very well be. As this page (in Dutch) notes that, unless the driver can show that ‘overmacht’ applies (they couldn’t have performed any action that would have avoided or reduced bodily harm), they are (at least in part) responsible for damages. For example, not engaging the brakes as soon as it is clear that you would hit them, would still result in them being (partially) liable for costs, even if the cyclist made an error themselves (crossing a red light).

Because the burden of proof is on the driver, it may be hard to prove that this is the case, resulting in their insurance having to pay up even if they did not do anything wrong.

8uurg@lemmy.world · 21 days ago

Robot Operating System maybe? Might be an acronym for something else too, though.

8uurg@lemmy.world · 2 months ago

At least the European Commission has their own Mastodon instance.

8uurg@lemmy.world · 2 months ago

Not quite, actually. It is moreso training recursively on the output without any changes, i.e., Data -> Model A -> Data (generated by Model A) -> Model B -> Data (generated by Model B -> …, that leads to (complete) collapse. A single step like this can still worsen performance notably, though, especially when it makes up the sheer majority of the data. [source]

And if they train using little data, you won’t get anywhere near the chatbots we have now. If they fine-tune an existing model to do as they wish, it would likely have side effects. Like being more likely to introduce security bugs in generated code, generally give incorrect answers to other common sense questions, and so on. [source]

8uurg@lemmy.world · 2 months ago

I personally prefer the more complex setup discussed in https://ncase.me/trust/.

The prisoners dilemma is a single decision game: you can tattle or stay silent, and as you don’t know what the other does, and due to how things are set up you would prefer to tattle, even if both staying silent yields better results for both parties.

Politics like this is more of a repeated game, like the one described in the link. You can trust one another, in spite of this single iteration Pareto optimal setting favoring betrayal, and work together. But also; show that you are not an easy mark that can be exploited.

8uurg@lemmy.world · 2 months ago

We rarely prove something correct. In mathematics, logical proofs are a thing, but in astronomy and physics it is moreso the case that we usually have a model that is accurate enough for our predictions, until we find evidence to the contrary, like here, and have an opportunity to learn and improve.

You really can’t ever prove a lot of things to be correct: you would have to show that no more cases exist that are not covered. But even despite the lack of proven correctness for all cases, these models are useful and provide correct predictions (most of the time), science is constantly on the lookout for cases where the model is wrong or incorrect.

8uurg@lemmy.world · 2 months ago

Wouldn’t the algorithm that creates these models in the first place fit the bill? Given that it takes a bunch of text data, and manages to organize this in such a fashion that the resulting model can combine knowledge from pieces of text, I would argue so.

What is understanding knowledge anyways? Wouldn’t humans not fit the bill either, given that for most of our knowledge we do not know why it is the way it is, or even had rules that were - in hindsight - incorrect?

If a model is more capable of solving a problem than an average human being, isn’t it, in its own way, some form of intelligent? And, to take things to the utter extreme, wouldn’t evolution itself be intelligent, given that it causes intelligent behavior to emerge, for example, viruses adapting to external threats? What about an (iterative) optimization algorithm that finds solutions that no human would be able to find?

Intellegence has a very clear definition.

I would disagree, it is probably one of the most hard to define things out there, which has changed greatly with time, and is core to the study of philosophy. Every time a being or thing fits a definition of intelligent, the definition often altered to exclude, as has been done many times.

8uurg@lemmy.world · edit-2 2 months ago

The flute doesn’t make for a good example, as the end user can take it and modify it as they wish, including third party parts.

If we force it: It would be if the manufacturer made it such that all (even third party) parts for These flutes can only be distributed through their store, and they use this restriction to force any third party to comply with additional requirements.

The key problem is isn’t including third party parts, it is actively blocking the usage of third party parts, forcing additional rules (which affect existing markets, like payment processors) upon them, making use of control and market dominance to accomplish this.

The Microsoft case was, in my view, weaker than this case against Apple, but their significant market dominance in the desktop OS market made it such that it was deemed anti-competitive anyways. It probably did not help that web standards suffered greatly when MS was at the helm, and making a competitive compatible browser was nigh impossible: most websites were designed for IE, using IE specific tech, effectively locking users into using IE. Because all users were using IE, developing a website using different tech was effectively useless, as users would, for other websites, end up using IE anyways. As IE was effectively the Windows browser (ignoring the brief period for IE for Mac…), this effectively ensured the Windows dominance too. Note that, without market dominance, websites would not pander specifically to IE, and this specific tie-in would be much less problematic.

In the end, Google ended IE’s reign by using Google Chrome, advertising it using the Google search engine’s reach. But if Microsoft had locked down the OS, like Apple does, and required everything to go through their ‘app store’. I don’t doubt we would have ended up with a similar browser engine restriction that Apple has, with all browsers being effectively a wrapper around the exact same underlying browser.

8uurg@lemmy.world · 2 months ago

Why would company A need to accomodate any other “app store” in their product, especially if one of their product’s selling point is how streamlined it is?

Why should Microsoft allow for other browsers to be installed on Windows? Why should Google allow for other search engines being selectable on Android and in Chrome? The reason in all these cases is the same: it is anti-competitive, and creates a monopoly. This results in unfairly high costs to users, where these users are 3rd party software developers or end users. Due to this countries have laws against this.

Companies obviously wouldn’t want to accommodate others in ways that cost them money, but that does not make it morally acceptable from a societal point of view.

8uurg@lemmy.world · 2 months ago

Yes, true, but that is assuming:

Any potential future improvement solely comes from ingesting more useful data.
That the amount of data produced is not ever increasing (even excluding AI slop).
No (new) techniques that makes it more efficient in terms of data required to train are published or engineered.
No (new) techniques that improve reliability are used, e.g. by specializing it for code auditing specifically.

What the author of the blogpost has shown is that it can find useful issues even now. If you apply this to a codebase, have a human categorize issues by real / fake, and train the thing to make it more likely to generate real issues and less likely to generate false positives, it could still be improved specifically for this application. That does not require nearly as much data as general improvements.

While I agree that improvements are not a given, I wouldn’t assume that it could never happen anymore. Despite these companies effectively exhausting all of the text on the internet, currently improvements are still being made left-right-and-center. If the many billions they are spending improve these models such that we have a fancy new tool for ensuring our software is more safe and secure: great! If it ends up being an endless money pit, and nothing ever comes from it, oh well. I’ll just wait-and-see which of the two will be the case.

8uurg@lemmy.world · 2 months ago

Not quite, though. In the blogpost the pentester notes that it found a similar issue (that he overlooked) that occurred elsewhere, in the logoff handler, which the pentester noted and verified when spitting through a number of the reports it generated. Additionally, the pentester noted that the fix it supplied accounted for (and documented) a issue that it accounted for, that his own suggested fix for the issue was (still) susceptible to. This shows that it could be(come) a new tool that allows us to identify issues that are not found with techniques like fuzzing and can even be overlooked by a pentester actively searching for them, never mind a kernel programmer.

Now, these models generate a ton of false positives, which make the signal-to-noise ratio still much higher than what would be preferred. But the fact that a language model can locate and identify these issues at all, even if sporadically, is already orders of magnitude more than what I would have expected initially. I would have expected it to only hallucinate issues, not finding anything that is remotely like an actual security issue. Much like the spam the curl project is experiencing.

8uurg@lemmy.world · 3 months ago

It is unclear to me what you are trying to accomplish, do you want to update the elements for where masked?

8uurg@lemmy.world · 3 months ago

Polars has essentially replaced Pandas for me. It is MUCH faster (in part due to lazy queries) and uses much less RAM, especially if the query can be streamed. While syntax takes a bit of getting used to at first, it allows me to specify a lot more without having to resort to apply with custom Python functions.

My biggest gripe is that the error messages are significantly less readable due to the high amount of noise: the stacktrace into the query executor does not help with locating my logic error, stringified query does not tell me where in the query things went wrong…

8uurg@lemmy.world · 3 months ago

The key point that is being made is that it you are doing de facto copyright infringement of plagiarism by creating a copy, it shouldn’t matter whether that copy was made though copy paste, re-compressing the same image, or by using AI model. The product being the copy paste operation, the image editor or the AI model here, not the (copyrighted) image itself. You can still sell computers with copy paste (despite some attempts from large copyright holders with DRM), and you can still sell image editors.

However, unlike copy paste and the image editor, the AI model could memorize and emit training data, without the input data implying the copyrighted work. (exclude the case where the image was provided itself, or a highly detailed description describing the work was provided, as in this case it would clearly be the user that is at fault, and intending for this to happen)

At the same time, it should be noted that exact replication of training data isn’t exactly desirable in any case, and online services for image generation could include a image similarity check against training data, and many probably do this already.

8uurg@lemmy.world · 4 months ago

At least the AI runs locally, as opposed to sending everything to someone else’s computer for processing. Local translation in Firefox actually works quite well.

8uurg@lemmy.world · 5 months ago

Add binary compatibility issues to that list: https://jangafx.com/insights/linux-binary-compatibility The moment you need software that is not packaged by your distro you either need to be lucky that whomever compiled it accounted for your setup, or compile it from scratch yourself (if open source and publicly available). Especially with closed source software (like most games) the latter isn’t even an option.