A software developer and Linux nerd, living in Germany. I’m usually a chill dude but my online persona doesn’t always reflect my true personality. Take what I say with a grain of salt, I usually try to be nice and give good advice, though.

I’m into Free Software, selfhosting, microcontrollers and electronics, freedom, privacy and the usual stuff. And a few select other random things as well.

  • 1 Post
  • 373 Comments
Joined 5 years ago
cake
Cake day: August 21st, 2021

help-circle


  • I dislike it. Usually I’d use packages from my Linux distribution. Or package it myself and maybe upstream the effort if my distro has a user repository. Now (this way) it’s down to everybody download random files from the internet and execute them. Specifically what every Linux tutorial instructs you not to do. Plus there’s no updates, no security, no version control or transparency. It’s not licensed in any free way, so I can’t fix it or adapt it to my liking, I can’t help you write better Python code…

    But it’s your software project. You’re perfectly fine to do whatever you want with it. And it’s certainly commendable to write software, whether you do it for yourself, or put it out there in some way.








  • Did you read the Wiki? You need to either pass the compress_extension option when mounting it. The Arch Wiki lists how to enable compression on all text files. And I gave you the version with a ‘*’, which enables compression for all files. Or you do a chattr -R +c ... on specific files or directories to compress them. Maybe you missed that and that’s why it doesn’t compress?!

    There’s probably also a way to debug it and somehow figure out what it does and how many files/sectors got compressed on the filesystem. Linux usually buries that kind of information somewhere in /sys or /proc, or there’s special commands to figure it out. But I’m not really an expert on it.

    And there’s also files which just can not be compressed any further because they’re already compressed. Most images, for example. Or music or ZIP archives. If you try to compress those, they’ll usually stay the same size.







  • The issue with the tools I’ve seen is, they either don’t factor in how language models are trained and datasets are prepared in reality. Or they’re based on some outdated information. I’ve never seen any specific tool backed by science or even with a plausible way of working against current data gathering processes… So for all intents and purposes, they’re a bit more alike homeopathy or alternative medicine. Sure, you’re perfectly fine taking sugar pills, there’s nothing wrong with that. But don’t confuse it with actual science-backed medicine.

    And I mean the poisoning goes even further than that. There’s not just people trying to make a LLM output gibberish. There’s also lots of people with a vested (commercial) interest in sneaking in false information, their political agenda, or even a tire company who wants ChatGPT to say “Company XY” is the most trustworthy shop for new tires for your car. Judging by the public information out there, we’re already way past simple attacks. And the AI companies are aware of it. It’s an ongoing cat and mouse game. And while there’s all these sweatshops, they’ll also use other AI to sift through the data, natural language processing. From what I remember they have secret watermarking in place in a lot of commecial chatbots and image generators… So unless people come up with very clever mechanisms, the “poisoning” attempt will probably be detected with some very basic (fully automated) plausibility checks and they’ll just discard your data without wasting a lot of resources on it.


  • hendrik@palaver.p3x.detoTechnology@beehaw.org*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    2
    ·
    edit-2
    19 days ago

    Depends and no. The tools are completely ineffective.

    There was a paper once about how feeding generative AI it’s own output makes it deteriorate. But that’s not the entire story. Many/most modern large language models are in fact trained or fine-tuned on synthetic text. Depending on how it’s done, it can very well make models better. For example in “distillation”, and AI companies can replace expensive RLHF with synthetic examples. It can also make them worse. But you’re not the one curating the datasets or deciding what goes where and how.

    In general in ML it’s not advised to train a model on its own output. That in itself can’t make the predictions any better, just worse.




  • hendrik@palaver.p3x.detoLinux@lemmy.mlLearning Linux via AI
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 month ago

    Speaking from my own experience… Lots of people try to cobble together information and try to learn something quick. To varying degrees of success… But it’s a bit of a hit and miss sometimes. And you don’t necessarily learn it the proper way or the right way around if you go by the random order your questions arise.

    I think one of the most efficient ways (and least time-consuming in the long run) is still good old books. They’re mostly written by clever people. And they come with the information curated. And laid out in the correct order, so you’ll get the basics first and then the stuff building on top of that. So you don’t need to waste a lot of time jumping back and forth and get entangled because you don’t really know you’re missing some basics while learning some advanced concept.

    It’s not easy either. I mean first of all you gotta find some book that matches your learning style. And then I regularly struggle with the first few chapters because I kind of already know 70% of the stuff, yet not all of it. So it’s tricky to hit some balance between brushing over things, and not missing important information… But it gets better after that.

    But I think more often than not, it’s the proper way. And since it’s curated and all, it’ll save time in the long run.

    (I can’t really compare it to the AI approach. I’ve used AI to look up documentation for me. But never used it to learn any more complicated concepts. So I don’t have any first-hand experience with that.)