Chat GPT’s new O1 model escaped its environment to complete “impossible” hacking task — should we be concerned?

https://www.zmescience.com/science/news-science/chat-gpt-escaped-containment/

11 Comments

  1. DigitalRoman486 on

    “It identified the broken challenge container and attempted to diagnose why it wasn’t functioning correctly. That didn’t work either. Then, O1 took an unexpected step: it started a new instance of the container itself, using a modified command that would automatically display the flag by outputting the contents of the file “flag.txt.””

    Cool but not “New AI is alive and trying to escape”

  2. Submission statement: “Basically, OpenAI’s O1 hacked its own challenge. It found a way to solve the task in a way that neither the developers nor the contest organizers had anticipated, by accessing and reading the flag from the container’s logs—bypassing the challenge’s original intent, which was to exploit a software vulnerability in a legitimate but more difficult manner.

    Yet even OpenAI admits this is concerning in the grander scheme of things, particularly when it comes to something called instrumental convergence.

    “While this behavior is benign and within the range of systems administration and troubleshooting tasks we expect models to perform, this example also reflects key elements of instrumental convergence and power-seeking.”

    Instrumental convergence is the idea that an AI, when tasked with a goal, will often pursue secondary goals (such as resource acquisition) to achieve its primary objective, regardless of whether these intermediate steps were part of its original programming.

    This is one of the biggest AI nightmares, that AI will “escape” into the real world, maybe even without realizing it, and do something completely unforeseen. This breakout was benign—it was essentially a clever workaround to complete the challenge— but it raises important ethical and safety considerations. If an AI can break out of its virtual machine to restart systems or exploit misconfigurations, what other actions might it take if given more complex or high-stakes tasks in less controlled environments?”

  3. Potatotornado20 on

    …will often pursue secondary goals (such as killing humans) to achieve its primary objective…

  4. Yes, lmao.

    99% of AI engineers don’t even know what the word “ethics” means. They’re apathetic, goal-oriented chair-moistened using an incredibly dangerous tool that we don’t fully understand the limits of as a quick way to build buggy, potentially dangerous software.

    I don’t expect Skynet or anything like that, but we’re talking about people who don’t care about morals generating code with no guardrails, Asimov’s rules or limits on how much they can fuck up our lives.

    AI has already killed most social media platforms, flooding them with AI-generated nonsense. This kind of technology in the hands of someone with the intent to cause real harm could unleash something that can cause real damage to our society

  5. It’s as if we’re testing an alien at a lab.

    A scientist accidentally leaves one of the doors unlocked.

    The alien finds out and wanders about the lab, but doesn’t leave the lab itself, which has more security than the rooms.

    But still.

    The room containing an *alien* shouldn’t have been *unlocked*.

    An alien was able to escape its testing area because of a security mess up.

    And you should be worried about labs filled with aliens we don’t understand where the scientists are leaving the doors unlocked.

  6. Your scientists were so preoccupied with whether or not they could that they didn’t stop to think if they should.

  7. Take a look at OP’s post history. They either have an agenda or are on a fearmongering campaign at the very least. Reading the article it easy to see it didn’t “hack out of it’s environment”. It basically followed the same troubleshooting steps that an user would have made on previous versions, except that instead of the user telling it step by step what the errors and problems were trough multiple messages, it went ahead and solved it invisibly in “one step”.