Impact of Heterogeneity and Risk Aversion on Task Allocation in Multi-Agent Teams

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

Très bien

Impact of Heterogeneity and Risk Aversion on Task Allocation in Multi-Agent Teams 본문

Notes/Automative Research Center (ARC)

Impact of Heterogeneity and Risk Aversion on Task Allocation in Multi-Agent Teams

LemonSoda 2025. 5. 25. 14:02

Paper Review

Impact of Heterogeneity and Risk Aversion on Task Allocation in Multi-Agent Teams (H.Wu, A. Ghadami, A. Bayrak, J. Smereka and B, Epureanu)

https://ieeexplore.ieee.org/document/9484733

Impact of Heterogeneity and Risk Aversion on Task Allocation in Multi-Agent Teams

Cooperative multi-agent decision-making is a ubiquitous problem with many real-world applications. In many practical applications, it is desirable to design a multi-agent team with a heterogeneous composition where the agents can have different capabilitie

ieeexplore.ieee.org

MATA (Multi-agent Task Allocation, 다중 에이전트 임무 할당)

다수의 에이전트를 중앙에서 통제하는 대신, 분산된 다수 에이전트 팀에 대해 planning과 performing을 동시에 수행하면 성능, 규모, 효율성 등의 측면에서 군집 운요의 성능을 향상시킬 수 있다.
Related Works
- [2]~[5] : 동종 에이전트 군집제어 사례 (temporal 제약, 통신 프로토콜, 임무 수행을 위한 동적 공간 환경 등을 고려)
- [6]~[9] : 이종 에이전트 군집제어
- [2], [3], [6]~[9] : 최적화 기법 기반의 알고리즘 제시
- [4], [5] : integer programming problems + 강화학습 적용
- [10] : Dec-POMDP (Decentralized Partially Observable Markov Decision Processes) 적용
- [11]-[13] : Deep Learning 적용
- [14] Heuristic Search 적용
Idea
- Agent 간 task handling, sensing, and communication 역량 차이 → task level transitions and perceived information에 영향을 줌
- decision-making process 수준의 차이 → risk aversion 수준의 차이
- 적용 모델 : HT-POMDP[14] : 이종 에이전트로부터 동일한 planning/execution을 적용, 즉, 각 에이전트는 동일한 유형의 속성값(attribute)를 갖지만, 각 모델별 capability (task 수행 능력, 센서 및 획득하는 정보의 수준, 통신 성능 등) 와 decision process는 다르도록 모델링한 Decentralized Partially Observed Markov Decision Proces(Dec-POMDP)의 확장 버전
- 학습 방식 : 강화학습(Deep Q-Network) 각 에이전트의 SE(센서 기반의 감지), IE(통신채널을 통한 정보추정)에 대한 belif를 바탕으로 action을 선택하기 위한 Deep Q-learning 구조 적용, 계산 효율성 증가, 과거 경험 반영, rewarding system에 따라 모든 agent는 동일한 목표를 수행하도록 학습됨)
  - Sensing level이 높을수록 실제 상태와 관측값이 일치할 확률 증가
  - Communication level이 높을수록 전송 정보가 정확하게 전달됨
  - 에이전트 마다 risk aversion 값을 다르게 적용하여 상의한 decision-making을 모델링함
    - θ < 1: 더 탐험적 (위험 감수형, 인간적)
    - θ > 5: 보수적/합리적인 에이전트
  - Noise Rational Model : 사람처럼 간혹 '비합리적' 선택이 가능하게 함 --> explore? greedy 변수?? (조금 더 확인해보자.)
- 주요 변수 : capability difference, decision making difference, reward 관대함(?), unexpected event
목적 (주요 특징)
- Development of a decentralized task allocation framework for teams with heterogeneous agents, which enables complex teaming interactions in environments with dynamic demands and uncertainties
- Incorporation of risk aversion and perception accuracy in agent decision-making processes, and analysis of their effects on team performance in the presence of unforseen events
- Quantitative investigationof the combined effects of heterogeneity and risk aversion on teaming performance
DQN Algorithm
- DQN 프레임웍에서 state 대신 belief를 적용하며, belief 값은 센서값의 관측 확률(SE)과 정보 통신 확률(IE)를 바탕으로 갱신된다. (정리중.. 내용 재확인 필요)
- Q값은 현재의 belief에서 행동 a를 선택했을 때 기대되는 미래 누적 보상
- Bellman Optimality를 적용하여 다음 상태에서 가장 높은 Q-값을 고려함
- 보상 함수는 task의 완성 정도에 따라 다르게 주어지며, 작업 완료에 대한 belief에 따라 보상에 대한 관대함을 조정 가능함 (정리중.. 내용 재확인 필요)

SE, IE를 적용한 DQN 기반의 task allocation 구조(Reference : Impact of Heterogeneity and Risk aversion on Task Allocation in Mulit-Agent Teams)

SE, IE에 대한 belief를 업데이트하고 →Q-Network을 학습함. Q-Network는 task의 수준에 대한 각 agent의 belief를 입력하면 출력으로 2가지 행동에 대한 Q-Value값을 출력한다. 이에 대한 도식은 다음 그림과 같다.

결론
- 강화학습 기반의 task allocation을 실험한 결과, 1)다양한 속성을 갖는 에이전트, 2) 일부 위험을 감수하는 에이전트(--> 조금 더 자세하게 이해하기)를 적용하면 팀 행동의 임무 수행 성능이 증가한다.
- (ChjatGPT - 확인 필요) 통합 학습 루프(환경과 DQN 연계) 가능
- (ChjatGPT - 확인 필요) 사람+자율 에이전트 혼합 팀 구성 전략
  - Type1 : 역할 분담 (인간 - 위험 감수(->: 이 부분은 자율로 이동해야 할 것 같은데..), 정보 부족 영역 담당; 자율 - 예측 가능 작업, 반복 수ㅐㅇ 담당)
  - Type 2 : 하나의 작업에 두 타입이 협력
  - Type 3: 탐색-보수 균형 ( 인간 - 새로운 작업 또는 미지의 영역 탑색, 자율 - 이미 파악된 영역을 지속적으로 수행)
- (ChjatGPT - 확인 필요) 각 이질성 변수(θ, 능력 등)의 하이퍼파라미터 튜닝 효과 분석
- (ChatGTP - 확인 필요) 협업 실패를 대비한 리커버리 정책 설계