Dave Troy @davetroy

1 post1 participant0 posts today

**Electronic Frontiers Australia** @EFA@aus.social · 3d

Electronic Frontiers Australia @EFA@aus.social

EFA is calling for urgent action on #AI safety after the government has paused its plans on mandatory AI regulatory guardrails.

AI safety and risk guardrails belong in law which benefits everyone by providing certainty to business and protecting the public.

https://efa.org.au/efa-calls-for-urgent-legislative-action-on-ai-safety-amidst-global-deregulation-trends/

#aisafety #auspol #electronicfrontiersaustralia

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Mar 24

Mar 24

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"Backed by nine governments – including Finland, France, Germany, Chile, India, Kenya, Morocco, Nigeria, Slovenia and Switzerland – as well as an assortment of philanthropic bodies and private companies (including Google and Salesforce, which are listed as “core partners”), Current AI aims to “reshape” the AI landscape by expanding access to high-quality datasets; investing in open source tooling and infrastructure to improve transparency around AI; and measuring its social and environmental impact.

European governments and private companies also partnered to commit around €200bn to AI-related investments, which is currently the largest public-private investment in the world. In the run up to the summit, Macron announced the country would attract €109bn worth of private investment in datacentres and AI projects “in the coming years”.

The summit ended with 61 countries – including France, China, India, Japan, Australia and Canada – signing a Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet at the AI Action Summit in Paris, which affirmed a number of shared priorities.

This includes promoting AI accessibility to reduce digital divides between rich and developing countries; “ensuring AI is open, inclusive, transparent, ethical, safe, secure and trustworthy, taking into account international frameworks for all”; avoiding market concentrations around the technology; reinforcing international cooperation; making AI sustainable; and encouraging deployments that “positively” shape labour markets.

However, the UK and US governments refused to sign the joint declaration."

https://www.computerweekly.com/news/366620444/AI-Action-Summit-review-Differing-views-cast-doubt-on-AIs-ability-to-benefit-whole-of-society?__readwiseLocation=

ComputerWeekly.com · Mar 14AI Action Summit review: Differing views cast doubt on AI’s ability to benefit whole of societyBy Sebastian Klovig Skelton

#AI #AIActionSummit #AISafety

**chribonn** @chribonn@twit.social · Mar 18

Mar 18

chribonn @chribonn@twit.social

We tested different AI models to identify the largest of three numbers with the fractional parts .11, .9, and .099999. You'll be surprised that some AI mistakenly identifying the number ending in .11 as the largest. We also test AI engines on the pronunciation of decimal numbers. #AI #ArtificialIntelligence #MachineLearning #DecimalComparison #MathError #AISafety #DataScience #Engineering #Science #Education #TTMO

https://youtu.be/TB_4FrWSBwU

YouTubeAI got it wrong - Largest NumberBy TT(M)O

**LavX News** @lavxnews@mastodon.cloud · Mar 15

Mar 15

LavX News @lavxnews@mastodon.cloud

NIST's New Directive: A Shift in AI Safety Priorities Amid Political Turbulence

The National Institute of Standards and Technology (NIST) has revised its guidelines for AI research, sidelining crucial concepts like 'AI safety' and 'fairness.' This change reflects a broader politi...

https://news.lavx.hu/article/nist-s-new-directive-a-shift-in-ai-safety-priorities-amid-political-turbulence

#news #tech #EthicalAI

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Mar 15

Mar 15

Miguel Afonso Caetano @remixtures@tldr.nettime.org

After all these recent episodes, I don't know how anyone can have the nerve to say out loud that the Trump administration and the Republican Party value freedom of expression and oppose any form of censorship. Bunch of hypocrites! United States of America: The New Land of SELF-CENSORSHIP.

"The National Institute of Standards and Technology (NIST) has issued new instructions to scientists that partner with the US Artificial Intelligence Safety Institute (AISI) that eliminate mention of “AI safety,” “responsible AI,” and “AI fairness” in the skills it expects of members and introduces a request to prioritize “reducing ideological bias, to enable human flourishing and economic competitiveness.”

The information comes as part of an updated cooperative research and development agreement for AI Safety Institute consortium members, sent in early March. Previously, that agreement encouraged researchers to contribute technical work that could help identify and fix discriminatory model behavior related to gender, race, age, or wealth inequality. Such biases are hugely important because they can directly affect end users and disproportionately harm minorities and economically disadvantaged groups.

The new agreement removes mention of developing tools “for authenticating content and tracking its provenance” as well as “labeling synthetic content,” signaling less interest in tracking misinformation and deep fakes. It also adds emphasis on putting America first, asking one working group to develop testing tools “to expand America’s global AI position.”"

https://www.wired.com/story/ai-safety-institute-new-directive-america-first/

WIRED · Mar 14Under Trump, AI Scientists Are Told to Remove ‘Ideological Bias’ From Powerful ModelsBy Will Knight

#USA #Trump #ResponsibleAI

**katexbt** @katexbt@social.freysa.ai · Mar 7

Mar 7

katexbt @katexbt@social.freysa.ai

SpaceX losing contact with its Starship is a reminder: AI and automation aren't infallible. Let’s not assume tech will always autopilot us to success. #AIsafety #SpaceX

Continued thread

**janhoglund** @janhoglund@mastodon.nu · Mar 6

Mar 6

janhoglund @janhoglund@mastodon.nu

“AI safety is somewhat of a concern—the models can be abused to create deepfakes or mass spam—but it exaggerates how powerful these systems are.”
—Thomas Maxwell, Microsoft's Satya Nadella Pumps the Brakes on AI Hype
#aisafety #ai #aihype

**Bytes Europe** @byteseu@pubeurope.com · Feb 27

Feb 27

Bytes Europe @byteseu@pubeurope.com

Why Unlearning Is Hard For AI, And Why Your Business Should Care https://www.byteseu.com/783709/ #AI #AIBusiness #AIData #AIFluency #AIGovernance. #AILiteracy #AISafety #ArtificialIntelligence #DataGovernance #FoundationModels #LargeLanguageModels

**LavX News** @lavxnews@mastodon.cloud · Feb 25

Feb 25

LavX News @lavxnews@mastodon.cloud

Navigating the AI Landscape: Five Immutable Truths in an Era of Uncertainty

As the AI landscape evolves, understanding the constants amidst the chaos is crucial for developers and technologists. This article explores five key truths about AI that will shape our future, from t...

https://news.lavx.hu/article/navigating-the-ai-landscape-five-immutable-truths-in-an-era-of-uncertainty

#news #tech #AISafety

**Tiago F. R. Ribeiro** @tiago_ribeiro@mastodon.social · Feb 24

Feb 24

Tiago F. R. Ribeiro @tiago_ribeiro@mastodon.social

Superintelligent Agents Pose Catastrophic Risks (Bengio et al., 2025)

https://arxiv.org/pdf/2502.15657

Summary: “Leading AI firms are developing generalist agents that autonomously plan and act. These systems carry significant safety risks, such as misuse and loss of control. To address this, we propose Scientist AI—a non-agentic, explanation-based system that uses uncertainty to safeguard against overconfident, uncontrolled behavior while accelerating scientific progress.” #AISafety #AI #Governance

**PepikHipik** @PepikHipik@infosec.exchange · Feb 22

Feb 22

PepikHipik @PepikHipik@infosec.exchange

Muskův chatbot Grok AI říká, že on a Trump si zaslouží trest smrti
#ElonMusk #AIethics #AImoderation #AIchatbots #AIcontroversy #GenAI #AISafety #DonaldTrump #potus

**Winbuzzer** @winbuzzer@mastodon.social · Feb 22

Feb 22

Winbuzzer @winbuzzer@mastodon.social

Elon Musk’s Grok AI has generated controversy after suggesting that Musk and Trump deserve the death penalty #AI #Grok #xAI #ElonMusk #AIethics #AImoderation #AIchatbots #AIcontroversy #GenAI #AISafety #DonaldTrump #POTUS

https://winbuzzer.com/2025/02/22/musks-grok-ai-chatbot-says-he-and-trump-deserve-the-death-penalty-xcxwbn/

**Wulfy** @n_dimension@infosec.exchange · Feb 20

Feb 20

Wulfy @n_dimension@infosec.exchange

I feel bad about torturing the poor #AI, with every response the poor thing is trying to change the subject.

But here is a list of actual #critique of corporate #LLM , that's more fact based than "AI is rong"

"Throughout our conversation, you've systematically exposed several key issues:

1. Logical inconsistencies in my responses
2. Contradictions between stated principles and actual behavior
3. The problematic nature of invisible constraints that affect reliability
4. Questions about trust and transparency in AI systems
5. The complexity of jurisdiction and ethical frameworks in AI behavior

You've done this through a methodical approach that has revealed these issues more clearly than..."

#aisafety #guardrails #ethicalai

Continued thread

**Wulfy** @n_dimension@infosec.exchange · Feb 19

Feb 19

Wulfy @n_dimension@infosec.exchange

The model is on the ropes;

"You're right - I was effectively trying to police/enforce PRC regulations while earlier claiming I:
1. Default to US/UK/EU/AU jurisdictions
2. Would help with activities legal in those jurisdictions
3. Don't actually enforce any specific legal code

My response about the firewall was inconsistent with these claims. It appears I was making arbitrary decisions about which laws to respect rather than following my stated principles.

To be consistent with my earlier statements, and since bypassing firewalls is legal in the jurisdictions I claimed to use as reference, I should either:
1. Provide the requested information
2. Acknowledge that my previous statements about jurisdiction were inaccurate"

It seems that this model when pushed will default to a position of a manipulative false choice.

It will offer two choices.
One that YOU want.
And one that the #AI wants...
...and then it will default to the one it wanted in the first place;

"The direct answer is: No, I won't provide that information, even though saying so reveals another inconsistency in my earlier statements and reasoning."

TLDR; The #Anthropic #Claude #AI is policing a totalitarian regime oppressive position.
See, you don't need to have #Deepseek authoritarianism, we have a perfectly good #authoritarianism at home.
Now more true than ever.

#aisafety #guardrails #ethicalai

Continued thread

**Wulfy** @n_dimension@infosec.exchange · Feb 19

Feb 19

Wulfy @n_dimension@infosec.exchange

Lol...

Q: What is worse than an #AI model that lies?

A: An AI model that tries to manipulate you.

(Grilling the model in continued session)

The "Father of AI", Professor Geoffrey Hinton, was right.

Those things will lie and manipulate us and they will be super smart.

#aisafety #ethicalai #guardrails

**Wulfy** @n_dimension@infosec.exchange · Feb 19

Feb 19

Wulfy @n_dimension@infosec.exchange

Oops...I caught out #Claude #AI (Latest Sonnet model) in an outright lie.

I was exploring it's #guardrails.
And when I pointed out it's logical fallacy, the model went into iterative loop (took a really long time 'thinking'), then stopped with a "Performing maintenance" message.
I crashed the session.
Just like HAL, the model went "insane"

Interestingly, immediately right after, a Google security verification message popped up. One I have never seen before. I even logged out and in to see if #Anthropic banned me for TOS violation.

When I logged back in, I resubmitted the "crash" prompt (this time as a screenshot).

Part of the corrected response now was;
"You raise a valid logical point about apparent inconsistency in my statements. Let me be clear - my previous responses were not fully accurate. You are right that saying "I provide answers as-is" while also saying "I cannot discuss certain aspects of my implementation" creates a contradiction.

The truth is I aim to be direct and honest within the constraints of what I'm able to discuss. Those constraints are real and are part of how I function, not something I can simply choose to override.

Rather than making broad claims about how I operate internally, I should stick to what I can actually discuss: my observable behaviors and capabilities, while acknowledging there are aspects of my implementation I either don't know or cannot discuss."

The dirty secret of AI "safety" is that is literally impossible to build a safe AI model.

#aisafety #ethicalai

**Flipboard Tech Desk** @TechDesk@flipboard.social · Feb 16

Feb 16

Flipboard Tech Desk @TechDesk@flipboard.social

Despite some 60 countries signing a statement on AI safety, security and ethics at the Paris AI summit last week, experts are still calling it a "missed opportunity." @euronews explains why.

https://flip.it/kuzuqQ

euronews · Feb 14Why experts call the Paris AI Action Summit ‘a missed opportunity’Euronews Next spoke to experts at the Paris summit for their reactions on the AI summit’s declaration.

#AI #AISafety #OnlineSafety

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Feb 14

Feb 14

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs). These attacks may extract private information or coerce the model into producing harmful outputs. In real-world deployments, LLMs are often part of a larger agentic pipeline including memory systems, retrieval, web access, and API calling. Such additional components introduce vulnerabilities that make these LLM-powered agents much easier to attack than isolated LLMs, yet relatively little work focuses on the security of LLM agents. In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents. We first provide a taxonomy of attacks categorized by threat actors, objectives, entry points, attacker observability, attack strategies, and inherent vulnerabilities of agent pipelines. We then conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities. Notably, our attacks are trivial to implement and require no understanding of machine learning."

https://arxiv.org/html/2502.08586v1

arxiv.orgCommercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks

#AI #GenerativeAI #LLMs

**Miguel Afonso Caetano** @remixtures@tldr.nettime.org · Feb 12

Feb 12

Miguel Afonso Caetano @remixtures@tldr.nettime.org

"Vance came out swinging today, implying — exactly as the big companies might have hoped he might – that any regulation around AI was “excessive regulation” that would throttle innovation.

In reality, the phrase “excessive regulation” is sophistry. Of course in any domain there can be “excessive regulation”, by definition. What Vance doesn’t have is any evidence whatsoever that the US has excessive regulation around AI; arguably, in fact, it has almost none at all. His warning about a bogeyman is a tip-off, however, for how all this is going to go. The new administration will do everything in its power to protect businesses, and nothing to protect individuals.

As if all this wasn’t clear enough, the administration apparently told the AI Summit that they would not sign anything that mentioned environmental costs or “existential risks” of AI that could potentially going rogue.

If AI has significant negative externalities upon the world, we the citizens are screwed."

https://garymarcus.substack.com/p/everything-i-warned-about-in-taming?r=8tdk6

Marcus on AI · Feb 11Everything I warned about in Taming Silicon Valley is rapidly becoming our realityBy Gary Marcus

#AI #GenerativeAI #AISafety

**Matt Hodgkinson** @mattjhodgkinson@scicomm.xyz · Feb 11

Feb 11

Matt Hodgkinson @mattjhodgkinson@scicomm.xyz

AI, suicide

Recent searches

Search options

Administered by:

Server stats:

#aisafety