Researchers have developed a new method to improve artificial intelligence systems that can complete tasks, such as writing or answering questions. The new approach, called Target Policy Optimization, helps these systems learn more efficiently by separating two key steps: deciding which tasks to focus on and adjusting the system's parameters to achieve those tasks. This method outperforms existing approaches in certain situations, particularly when the system is faced with limited rewards or sparse feedback, and is now available for others to use and build upon.