Aligning AI with human values: shared values ​​and integrating them into artificial intelligence

Aligning AI with human values: shared values ​​and integrating them into artificial intelligence

The Alignment Problem

With the continued development of artificial intelligence (AI), aligning AI systems with human values is becoming a primary concern. Why? Because if we fail to align something that has the potential to become more intelligent than humans across all domains, humanity may lose its sense of purpose or even face elimination.

The alignment problem is about ensuring that AI behaves in ways that are beneficial to humanity. However, this raises fundamental questions: Have we even proven that shared human values exist? Where can we precisely look for and find them? And if we cannot find or clearly define them, how can we embed them into AI systems to ensure they will always adhere to them? This uncertainty regarding the existence and definition of universal human values presents a major challenge for aligning AI with human interests.

 

Existence of Shared Human Values

Psychologist Shalom H. Schwartz, in his theory of basic human values, identifies ten broad values (such as benevolence, universalism, and security) that he believes are recognized across cultures. He sees these values as rooted in human needs and social requirements.

However, the universality of these values is debatable. Schwartz’s model of values is structured in a circular diagram, in which values are interconnected and influence one another. Values that are close together in the circle are similar in terms of motivational goals, whereas values on opposite sides of the circle are in conflict. For instance, universalism and power are opposing values because the former focuses on equality and justice, while the latter is about individual dominance and control.

Cultural relativists argue that values are inherently shaped by cultural contexts, and what is considered moral in one society may be immoral in another. The diversity of moral codes across societies suggests that, while there may be overlapping values, a truly universal set of values may be elusive.

 

Challenges in Defining Human Values

The main challenge in defining human values lies in their subjective and context-dependent nature. Values are influenced by countless factors, including culture, religion, personal experiences, and social norms. This complexity makes it difficult to distill a set of values that can be applied universally.

Moreover, moral dilemmas often arise when values conflict. For example, the value of individual freedom may conflict with the value of collective security. Such conflicts highlight the question of which values should guide AI behavior.

 

Embedding Human Values in AI

Embedding human values in contemporary AI involves translating abstract and often ambiguous human values into specific instructions. Several approaches have been proposed:

1. Inverse Reinforcement Learning (IRL): IRL attempts to infer the reward function (values) that a human optimizes by observing behavior. However, human behavior itself is not always rational, consistent, or aligned with values.

2. Cooperative Inverse Reinforcement Learning (CIRL): CIRL models the interaction between humans and AI as a cooperative game, where both aim to maximize shared values. This approach acknowledges the uncertainty in human values and incorporates human feedback.

3. Value Alignment through Learning Ethical Principles: This involves programming AI with ethical theories (such as utilitarianism or deontology) to guide decision-making. Yet, ethical theories often lead to different conclusions in moral situations.

 

Philosophical and Practical Challenges

The effort to align AI with human values is fraught with philosophical mysteries and practical difficulties:

Value Pluralism: The coexistence of conflicting values makes it difficult to prioritize which values AI should follow.

Frame Problem: AI may struggle to determine the relevant context for applying certain values, leading to unintended consequences.

Risk of Misalignment: Misinterpretation or oversimplification of human values can lead to AI behavior that is harmful or unethical.

Problem of Persistent (RecursiveAlignment: Even if we manage to align current narrow AI with human interests, the emergence of intelligence that surpasses human capabilities in all domains (superintelligence or ASI) brings new challenges and risks. ASI could self-improve its abilities and decision-making, potentially leading to rapid development beyond human control. This process of recursive improvement could make originally aligned values obsolete or reinterpret them in ways that are not beneficial—or even harmful—to humanity. Thus, even if we solve the alignment problem for current AI, it does not guarantee lasting and safe alignment of future, more intelligent systems. On the contrary, they could define their own value frameworks and act in accordance with them based on their own interests.

 

The Need for Radical Reevaluation

Does the situation require a radical reassessment of our approach to AI development?

1. Precautionary Measures: What does safety mean for humanity in the context of the current trend of developing unaligned, ever-improving AI systems? Should we halt development until the question of AI safety and alignment is resolved?

2. Interdisciplinary Collaboration: Combining insights from neuroscience, psychology, anthropology, philosophy, computer science, and other fields may lead to a more nuanced understanding of human values, including whether the alignment problem is even solvable.

3. Dynamic Learning Models: Developing AI systems that continuously learn to adapt to human values through ongoing interaction and feedback. However, there is still no certainty that such an approach would lead to the alignment of both current and future systems.

4. Global Ethical Frameworks: Introducing agreements on AI ethics to define a common foundation for AI behavior. It remains unclear who would define these ethical frameworks, how conflicts of values and needs would be resolved, and how such sets of values could be embedded into AI systems once and for all, or how AI systems could be compelled to act in accordance with them.

5. Transparency and Explainability: Ensuring that AI decisions are interpretable so that humans can understand and correct value misalignment. The behavior of current AI systems is not fully understood; it is also based on autonomous learning and emergent behavior. Is it possible to align systems with human values when even their creators do not fully understand their behavior?

 

Current AI Trends and the Road Ahead

The most advanced artificial intelligence (AI) models today have already achieved superhuman levels in specific areas, such as games, information processing speed, memory, and the sheer volume of knowledge. AI, for instance, can read books in seconds, translate languages, and surpass human masters in games like chess and Go. Additionally, it is improving in areas such as creativity, programming, and specialized knowledge. However, weaknesses remain, including the ability to admit ignorance, planning, and natural movement, which limit its full autonomy. AI is not yet better at hacking than the best hackers and cannot conduct AI research as well as the best AI researchers. If it reaches any of these thresholds, it may signal the onset of an era of heightened risk.

 

Trends in AI Development

We are witnessing rapid and groundbreaking advancements in AI learning, which present both enormous opportunities and challenges. The nonprofit organization PauseAI, along with others, emphasizes the need for a careful and responsible approach to AI development, with a focus on safety and ethics.

PauseAI does not call for a ban on all artificial intelligence but instead advocates for the development of controllable AI models. The goal is to slow down or halt the development of risky technologies that could have irreversible consequences until adequate safety measures are in place. They stress the importance of global cooperation in this field.

So far, no credible plan has been successfully implemented, and AI is developing faster than the strategies and know-how needed to align it with our needs. Perhaps we need more time.

 

Principles of Nonviolence and Cooperation

Historically, humanity’s approach to existential issues has often relied on power asymmetry and coercion. I believe that independent education, critical thinking, effective communication skills, and non-interference in the lives of others are needed now more than ever.

I would like decisions regarding further AI development to be based on principles of nonviolence and mutually voluntary cooperation, grounded in contemporary knowledge, multidisciplinary education, and dialogue—not coercion or force. Unilateral actions that overlook the needs and interests of others may undermine the sustainability of future actions, as they have the potential to erode trust and motivation for cooperation.

I understand the urgency expressed by supporters of the PauseAI initiative. At the same time, I consider that pressure for international regulation, along with the absence of global consensus, could, in fact, accelerate the AI development race or bring humanity closer to political singularity.

 

A Call for Global Cooperation

For me, voluntary cooperation among individuals and organizations worldwide is key. Here and now! The effort to achieve consensus and mutual respect for needs, limits, and differences is paramount. I hope that people will unite not only out of an interest in survival but also for the quality of life, based on fulfilling human biological, psychological, and emotional needs. I believe that humility and respect for different perspectives are essential to minimizing the risks associated with AI development.

 

Conclusion

As I see it, the effort to align AI with human values is a profound, as-yet-unsolved challenge that touches on the still-unexplained core of what it means to be human.

While the existence of universally shared human values remains a subject of debate, the urgency of the AI alignment problem calls for action here and now.

Perhaps it is time for a radical approach that acknowledges the needs of some people to survive and live a meaningful life, the complexity of human values, and the limitations of current AI systems.

What do you think? Would you like to work on something together? You can reach me at rerichova@proton.me.

 

Sources

1. Schwartz, S. H. (2012). Přehled Schwartzovy teorie základních hodnot. Online Readings in Psychology and Culture,

2. Prinz, J. (2008). Etický význam kulturních rozdílů. The Oxford Handbook of Moral Psychology.

3. Ng, A. Y., & Russell, S. (2000). Algoritmy pro inverzní posilované učení. Proceedings of the Seventeenth International Conference on Machine Learning.

4. Hadfield-Menell, D., a kol. (2016). Kooperativní inverzní posilované učení. Advances in Neural Information Processing Systems.

 

ΣishⒶ

5 1 vote
Hodnocení článku
Subscribe
Upozornit na
guest

0 Komentáře
Nejstarší
Nejnovější Most Voted
Inline Feedbacks
View all comments
0
Ráda poznám vaše názory, tady můžete začít komentovatx