Misplaced Pages

OpenAI Five

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

OpenAI Five is a computer program by OpenAI that plays the five-on-five video game Dota 2 . Its first public appearance occurred in 2017, where it was demonstrated in a live one-on-one game against the professional player Dendi , who lost to it. The following year, the system had advanced to the point of performing as a full team of five, and began playing against and showing the capability to defeat professional teams.

#189810

162-560: By choosing a game as complex as Dota 2 to study machine learning , OpenAI thought they could more accurately capture the unpredictability and continuity seen in the real world, thus constructing more general problem-solving systems. The algorithms and code used by OpenAI Five were eventually borrowed by another neural network in development by the company, one which controlled a physical robotic hand. OpenAI Five has been compared to other similar cases of artificial intelligence (AI) playing against and defeating humans, such as AlphaStar in

324-415: A {\displaystyle a} and a policy π {\displaystyle \pi } , the action-value of the pair ( s , a ) {\displaystyle (s,a)} under π {\displaystyle \pi } is defined by where G {\displaystyle G} now stands for the random discounted return associated with first taking action

486-552: A {\displaystyle a} in state s {\displaystyle s} and following π {\displaystyle \pi } , thereafter. The theory of Markov decision processes states that if π ∗ {\displaystyle \pi ^{*}} is an optimal policy, we act optimally (take the optimal action) by choosing the action from Q π ∗ ( s , ⋅ ) {\displaystyle Q^{\pi ^{*}}(s,\cdot )} with

648-564: A {\displaystyle a} when in state s {\displaystyle s} . There are also deterministic policies. The state-value function V π ( s ) {\displaystyle V_{\pi }(s)} is defined as, expected discounted return starting with state s {\displaystyle s} , i.e. S 0 = s {\displaystyle S_{0}=s} , and successively following policy π {\displaystyle \pi } . Hence, roughly speaking,

810-555: A ) {\displaystyle (s,a)} are obtained by linearly combining the components of ϕ ( s , a ) {\displaystyle \phi (s,a)} with some weights θ {\displaystyle \theta } : The algorithms then adjust the weights, instead of adjusting the values associated with the individual state-action pairs. Methods based on ideas from nonparametric statistics (which can be seen to construct their own features) have been explored. Value iteration can also be used as

972-402: A ) = Pr ( A t = a ∣ S t = s ) {\displaystyle \pi (s,a)=\Pr(A_{t}=a\mid S_{t}=s)} that maximizes the expected cumulative reward. Formulating the problem as a Markov decision process assumes the agent directly observes the current environmental state; in this case, the problem is said to have full observability . If

1134-672: A mobile game in 2014 that used characters from the Dota universe. uCool, who was previously involved in a lawsuit with Blizzard in 2015 for similar reasons, along with another Chinese company, Lilith Games, argued that the forum post invalidated any ownership claims of the intellectual property, stating that the Dota property was an open-source , collective work that could not be copyrighted by anyone in particular. Judge Charles R. Breyer denied uCool's motion for summary dismissal , with Blizzard filing motions to dismiss all claims against uCool and Lilith with prejudice . An early goal of

1296-436: A 50% chance to win in any given game. Ranked game modes with a separately tracked MMR are also available, which primarily differ from unranked games by making MMR publicly visible, as well as requiring the registration of a phone number to their accounts, which help foster a more competitive environment. To ensure that each player's ranking is up to date and accurate, MMR is recalibrated around every six months. Players with

1458-507: A Go board has about 400 enumerations. OpenAI Five have received acknowledgement from the AI, tech, and video game community at large. Microsoft founder Bill Gates called it a "big deal", as their victories "required teamwork and collaboration". Chess player Garry Kasparov , who lost against the Deep Blue AI in 1997, stated that despite their losing performance at The International 2018,

1620-472: A board game or a modern video game." They added that the DeepMind and OpenAI victories were also a testament to the power of certain uses of reinforcement learning. It was OpenAI's hope that the technology could have applications outside of the digital realm. In 2018, they were able to reuse the same reinforcement learning algorithms and training code from OpenAI Five for Dactyl , a human-like robot hand with

1782-762: A continuous, but small stream of gold over the course of a match. Dota 2 features multiple game types which mainly alter the way hero selection is handled; examples include "All Pick", which offer no restrictions on hero selection, "All Random", which randomly assigns a hero for each player, "Captain's Mode", where a single player on each team selects heroes for their entire team and is primarily used for professional play, and "Turbo", an expedited version of All Pick featuring increased gold and experience gain, weaker towers, and faster respawn times. Matches usually last around 30 minutes to an hour, although they can last forever as long as both Ancients remain standing. In Captain's Mode games, an additional " GG " forfeit feature

SECTION 10

#1732791741190

1944-471: A day-night cycle, with some hero abilities and other game mechanics being altered depending on the time of the cycle. Present on the map are "neutral creeps" that are hostile to both teams, and reside in marked locations on the map known as "camps". Camps are located in the area between the lanes known as the "jungle", which both sides of the map have. Neutral creeps do not attack unless provoked, and respawn over time if killed. The most powerful neutral creep

2106-414: A dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms , alongside supervised learning and unsupervised learning . Q-learning at its simplest stores data in tables. This approach becomes infeasible as the number of states/actions increases (e.g., if the state space or action space were continuous), as the probability of

2268-480: A full team of five and were able to defeat teams of amateur and semi-professional players. At The International 2018 , OpenAI Five played in two games against professional teams, one against the Brazilian-based paiN Gaming and the other against an all-star team of former Chinese players. Although the bots lost both matches, OpenAI still considered it a successful venture, stating that playing against some of

2430-435: A function of the parameter vector θ {\displaystyle \theta } . If the gradient of ρ {\displaystyle \rho } was known, one could use gradient ascent . Since an analytic expression for the gradient is not available, only a noisy estimate is available. Such an estimate can be constructed in many ways, giving rise to algorithms such as Williams' REINFORCE method (which

2592-624: A mapping from a finite-dimensional (parameter) space to the space of policies: given the parameter vector θ {\displaystyle \theta } , let π θ {\displaystyle \pi _{\theta }} denote the policy associated to θ {\displaystyle \theta } . Defining the performance function by ρ ( θ ) = ρ π θ {\displaystyle \rho (\theta )=\rho ^{\pi _{\theta }}} under mild conditions this function will be differentiable as

2754-455: A model capable of generating sample transitions is required, rather than a full specification of transition probabilities , which is necessary for dynamic programming methods. Monte Carlo methods apply to episodic tasks, where experience is divided into episodes that eventually terminate. Policy and value function updates occur only after the completion of an episode, making these methods incremental on an episode-by-episode basis, though not on

2916-416: A move that brought the game's ranked system closer to ones used in other competitive games such as Global Offensive , StarCraft , and League of Legends . For most of 2018, Valve decided to handle gameplay balance updates for the game in a different way. Instead of releasing larger updates irregularly throughout the year, smaller ones would be released on a set schedule of every two weeks. Around

3078-595: A neural network built to manipulate physical objects. In 2019, Dactyl solved the Rubik's Cube . Dota 2 Dota 2 is a 2013 multiplayer online battle arena (MOBA) video game by Valve . The game is a sequel to Defense of the Ancients ( DotA ), a community-created mod for Blizzard Entertainment 's Warcraft III: Reign of Chaos . Dota 2 is played in matches between two teams of five players, with each team occupying and defending their own separate base on

3240-476: A player's account in order to play them, an anti- griefing and smurfing practice they had previously implemented in their first-person shooter game Counter-Strike: Global Offensive . Further changes to the game's matchmaking were brought in an update in November 2017, where the old numerical MMR system was replaced by a seasonal one based on eight ranked tiers that are recalibrated around every six months,

3402-412: A player's account until they win a specific number of games, and only groups them with other players who have the same punishment. Other features include an improved replay system from Defense of the Ancients , in which a completed game can be downloaded in-client and viewed by anyone at a later time, and the "hero builds" feature, which provide integrated guides created by the community that highlight to

SECTION 20

#1732791741190

3564-411: A policy that maximizes the discounted return by maintaining a set of estimates of expected discounted returns E ⁡ [ G ] {\displaystyle \operatorname {\mathbb {E} } [G]} for some policy (usually either the "current" [on-policy] or the optimal [off-policy] one). These methods rely on the theory of Markov decision processes, where optimality is defined in

3726-615: A pre-game drafting phase, where they can discuss potential strategies and hero matchups with their teammates. Heroes are removed from the drafting pool and become unavailable for all other players once one is selected, and can not be changed once the drafting phase is over. All heroes have a basic attack in addition to powerful abilities , which are the primary method of fighting. Each hero has at least four of them, all of which are unique. Heroes begin each game with an experience level of one, only having access to one of their abilities, but are able to level up and become more powerful during

3888-411: A primary attribute which adds to their basic non-ability damage output and other minor buffs—with four attributes including Universal, which rewards extra damage for attribute points. Heroes have an ability augmentation system known as the "Talent Tree", which allow players more choices on how to develop their hero. If a hero runs out of health points and dies, they are removed from active play until

4050-596: A product line that would include statues, weapons, and armor based on characters and items from the game. In February 2013, the National Entertainment Collectibles Association announced a new toy line featuring hero-themed action figures at the American International Toy Fair . At Gamescom 2015, an HTC Vive virtual reality (VR) tech demo based around the shopkeeper of the game's item shop

4212-484: A prominent esports game. IceFrog and Mescon later had a falling out in May 2009, which prompted the former to establish a new community website at playdota.com. Rob Pardo , the executive vice president of Blizzard Entertainment at the time, stated that the company had discussed with IceFrog in person the possibility of developing a MOBA but that they felt they were not ready to commit to such an endeavor. Valve's interest in

4374-537: A report by the Anti-Defamation League found that up to 79% of the game's playerbase had reported being harassed in some way while playing it, which topped their list. Following its reveal in 2011, Dota 2 won IGN 's People's Choice Award. In December 2012, PC Gamer listed Dota 2 as a nominee for their Game of the Year award, as well as the best esports game of the year. In 2013, Dota 2 won

4536-429: A schedule (making the agent explore progressively less), or adaptively based on heuristics. Even if the issue of exploration is disregarded and even if the state was observable (assumed hereafter), the problem remains to use past experience to find out which actions lead to higher cumulative rewards. The agent's action selection is modeled as a map called policy : The policy map gives the probability of taking action

4698-460: A seasonal Elo rating -based matchmaking system, which is measured by a numerical value known as "matchmaking rating" (MMR) that is tracked separately for core and support roles, and ranked into different tiers. MMR is updated based on if a player won or lost, which will then increase or decrease respectively. The game's servers, known as the "Game Coordinator", attempts to balance both teams based on each player's MMR, with each team having roughly

4860-470: A sense stronger than the one above: A policy is optimal if it achieves the best-expected discounted return from any initial state (i.e., initial distributions play no role in this definition). Again, an optimal policy can always be found among stationary policies. To define optimality in a formal manner, define the state-value of a policy π {\displaystyle \pi } by where G {\displaystyle G} stands for

5022-514: A sequel with the already-identifiable brand. Holding the Dota name to be a community asset, Feak and Mescon filed an opposing trademark for Dota on behalf of DotA-Allstars, LLC (then a subsidiary of Riot Games ) in August 2010. Rob Pardo similarly stated that the Dota name belonged to the mod's community. Blizzard acquired DotA-Allstars, LLC from Riot Games and filed an opposition against Valve in November 2011, citing Blizzard's ownership of both

OpenAI Five - Misplaced Pages Continue

5184-430: A similar bot for Starcraft II , AlphaStar . Like OpenAI Five, AlphaStar used reinforcement learning and self-play. The Verge reported that "the goal with this type of AI research is not just to crush humans in various games just to prove it can be done. Instead, it’s to prove that — with enough time, effort, and resources — sophisticated AI software can best humans at virtually any competitive cognitive challenge, be it

5346-471: A starting point, giving rise to the Q-learning algorithm and its many variants. Including Deep Q-learning methods when a neural network is used to represent Q, with various applications in stochastic search problems. The problem with using action-values is that they may need highly precise estimates of the competing action values that can be hard to obtain when the returns are noisy, though this problem

5508-433: A step-by-step (online) basis. The term “Monte Carlo” generally refers to any method involving random sampling ; however, in this context, it specifically refers to methods that compute averages from complete returns, rather than partial returns. These methods function similarly to the bandit algorithms , in which returns are averaged for each state-action pair. The key difference is that actions taken in one state affect

5670-416: A team of five, the first public demonstration occurred at The International 2017 in August, the annual premiere championship tournament for the game, where Dendi , a Ukrainian professional player, lost against an OpenAI bot in a live one-on-one matchup. After the match, CTO Greg Brockman explained that the bot had learned by playing against itself for two weeks of real time , and that the learning software

5832-614: A ten-versus-ten mode, a Halloween -themed capture point mode "Colosseum", a combat arena mode "Overthrow", "Siltbreaker", a story-driven cooperative campaign mode , and "The Underhollow", a battle royale mode. The move to the Source 2 engine in 2015 added the "Arcade" feature, which allows for community-created game modes, with the more popular ones having dedicated server hosting by Valve. One popular example, known as Dota Auto Chess , had over seven million in-game subscribers by April 2019. Owing to its popularity, Valve met with

5994-574: A timer counts down to zero, where they are respawned in their base with only some gold lost. The two teams—known as the Radiant and Dire—occupy fortified bases in opposite corners of the map, which is divided in half by a crossable river and connected by three paths, which are referred to as "lanes". The lanes are guarded by defensive towers that attack any opposing unit who gets within its line of sight . A small group of weak computer-controlled creatures called " creeps " travel predefined paths along

6156-418: A times a day for months in a system that OpenAI calls "reinforcement learning", in which they are rewarded for actions such as killing an enemy and destroying towers. Demonstrations of the bots playing against professional players have occurred at some events, such as Dendi , a professional Ukrainian player of the game, losing to one of them in a live 1v1 matchup at The International 2017 . A year later,

6318-406: A type of in-game battle pass called the "Compendium", which raises money from players buying them and connected lootboxes to get exclusive in-game cosmetics and other bonuses offered through them. 25% of all the revenue made from Compendiums go directly to the prize pool, with sales from the 2013 battle pass raising over US$ 2.8 million, which made it the largest prize pool in esports history at

6480-415: A year later. Valve adopted the word " Dota ", derived from the original mod's acronym, as the name for its newly acquired franchise. Johnson argued that the word referred to a concept, and was not an acronym. Shortly after the announcement of Dota 2 , Valve filed a trademark claim to the Dota name. At Gamescom 2011, company president Gabe Newell explained that the trademark was needed to develop

6642-564: Is stationary if the action-distribution returned by it depends only on the last state visited (from the observation agent's history). The search can be further restricted to deterministic stationary policies. A deterministic stationary policy deterministically selects actions based on the current state. Since any such policy can be identified with a mapping from the set of states to the set of actions, these policies can be identified with such mappings with no loss of generality. The brute force approach entails two steps: One problem with this

OpenAI Five - Misplaced Pages Continue

6804-462: Is 35 and 250 in Go. Continuous observation space : Dota 2 is played on a large map with ten heroes, five on each team, along with dozens of buildings and non-player character (NPC) units. The OpenAI system observes the state of a game through developers’ bot API, as 20,000 numbers that constitute all information a human is allowed to get access to. A chess board is represented as about 70 lists, whereas

6966-456: Is a neural network containing a single layer with a 4096-unit LSTM that observes the current game state extracted from the Dota developer's API. The neural network conducts actions via numerous possible action heads (no human data involved), and every head has meaning. For instance, the number of ticks to delay an action, what action to select – the X or Y coordinate of this action in a grid around

7128-430: Is a state randomly sampled from the distribution μ {\displaystyle \mu } of initial states (so μ ( s ) = Pr ( S 0 = s ) {\displaystyle \mu (s)=\Pr(S_{0}=s)} ). Although state-values suffice to define optimality, it is useful to define action-values. Given a state s {\displaystyle s} , an action

7290-507: Is allowed to change, A policy that achieves these optimal state-values in each state is called optimal . Clearly, a policy that is optimal in this sense is also optimal in the sense that it maximizes the expected discounted return, since V ∗ ( s ) = max π E [ G ∣ s , π ] {\displaystyle V^{*}(s)=\max _{\pi }\mathbb {E} [G\mid s,\pi ]} , where s {\displaystyle s}

7452-600: Is available to end games early. Dota 2 occasionally features limited-time events that present players with alternative game modes that do not follow the game's standard rules. Some of these included the Halloween -themed Diretide event, the Christmas -themed Frostivus event, and the New Bloom Festival, which celebrated the coming of spring . Other special game modes have been created by Valve, including

7614-446: Is chosen, and the agent chooses the action that it believes has the best long-term effect (ties between actions are broken uniformly at random). Alternatively, with probability ε {\displaystyle \varepsilon } , exploration is chosen, and the action is chosen uniformly at random. ε {\displaystyle \varepsilon } is usually a fixed parameter but can be adjusted either according to

7776-788: Is done by a selection of dedicated esports organizations and personnel who provide on-site commentary , analysis, and player interviews, similar to traditional sporting events. Coverage of Dota 2 tournaments have been simulcast on television networks, such as ESPN in the United States, BBC Three in the United Kingdom, Sport1 in Germany, TV 2 Zulu in Denmark, Xinwen Lianbo in China, Astro in Malaysia, and TV5 in

7938-422: Is featured in the game. Activating an ability costs a hero some of their " mana points ", which slowly regenerates over time. Using an ability will also cause it to enter a cooldown period, in which the ability can not be used again until a timer resets. All heroes have three attributes : strength, intelligence, and agility, which affect health points , mana points, and attack speed, respectively. Each hero has

8100-455: Is mitigated to some extent by temporal difference methods. Using the so-called compatible function approximation method compromises generality and efficiency. An alternative method is to search directly in (some subset of) the policy space, in which case the problem becomes a case of stochastic optimization . The two approaches available are gradient-based and gradient-free methods. Gradient -based methods ( policy gradient methods ) start with

8262-581: Is named "Roshan", who is a unique boss that may be defeated by either team to obtain special items , such as one that allows a one-time resurrection if the hero that holds it is killed. Roshan will respawn around ten minutes after being killed, and becomes progressively harder to kill as the match progresses over time. Runes, which are special items that spawn in set positions on the map every few minutes, offer heroes temporary, but powerful power-ups when collected, such as double damage and invisibility. In addition to having abilities becoming stronger during

SECTION 50

#1732791741190

8424-491: Is that the latter do not assume knowledge of an exact mathematical model of the Markov decision process, and they target large MDPs where exact methods become infeasible. Due to its generality, reinforcement learning is studied in many disciplines, such as game theory , control theory , operations research , information theory , simulation-based optimization , multi-agent systems , swarm intelligence , and statistics . In

8586-508: Is that the number of policies can be large, or even infinite. Another is that the variance of the returns may be large, which requires many samples to accurately estimate the discounted return of each policy. These problems can be ameliorated if we assume some structure and allow samples generated from one policy to influence the estimates made for others. The two main approaches for achieving this are value function estimation and direct policy search . Value function approaches attempt to find

8748-440: Is the discount rate . γ {\displaystyle \gamma } is less than 1, so rewards in the distant future are weighted less than rewards in the immediate future. The algorithm must find a policy with maximum expected discounted return. From the theory of Markov decision processes it is known that, without loss of generality, the search can be restricted to the set of so-called stationary policies. A policy

8910-614: The Dota intellectual property began when several veteran employees, including Team Fortress 2 designer Robin Walker and executive Erik Johnson, became fans of the mod and wanted to build a modern sequel. The company corresponded with IceFrog by email about his long-term plans for the project, and he was subsequently hired to design a sequel. IceFrog announced his new position through his blog in October 2009, with Dota 2 being announced

9072-612: The Asia Pacific University of Technology & Innovation in Malaysia, have held courses teaching students the fundamentals and core skills to use for the game. Dota 2 has been a part of multi-sport events in Asia, such as the Asian Indoor and Martial Arts Games and Southeast Asian Games . The popularity of Dota 2 led Valve to produce apparel, accessories, figurines , and several other products featuring

9234-479: The Boston Major tournament in late 2016. Several more episodes have since been produced. Valve have also endorsed cosplay competitions featuring the game's heroes, which take place during downtime at some Dota 2 tournaments and feature prize pools of their own. Creation of Dota 2 -themed animations and CGI videos , mostly created by the community with Source Filmmaker , also take place. Similar to

9396-568: The Dota 2 Workshop Tools are a set of Source 2 software development kit (SDK) tools that allow content creators to create new hero cosmetics, as well as custom game modes, maps, and bot scripts. Highly rated cosmetics, through the Steam Workshop , are available in the in-game store if they are accepted by Valve. This model was fashioned after Valve's Team Fortress 2 , which had earned Workshop designers of cosmetic items of that game over $ 3.5 million by June 2011. Newell revealed that

9558-529: The Dota 2 team was the adaptation of Defense of the Ancients ' aesthetic style for the Source engine . The Radiant and Dire factions replaced the Sentinel and Scourge from the mod, respectively. Character names, abilities, items and map design from the mod were largely retained, with some changes due to trademarks owned by Blizzard. In the first Q&A session regarding Dota 2 , IceFrog explained that

9720-532: The Dota Pro Circuit , which are a series of tournaments that award qualification points for earning direct invitations to The International , the game's premier tournament held annually. Internationals feature a crowdfunded prize money system that has seen amounts in upwards of US$ 40 million, making Dota 2 one of the most lucrative esports. Media coverage of most tournaments is done by a selection of on-site staff who provide commentary and analysis for

9882-477: The Steam Workshop . Two spinoff games, Artifact and Dota Underlords , were released by Valve. Dota 2 has been used in machine learning experiments, with a team of bots known as the OpenAI Five showing the capability to defeat professional players. Dota 2 is a multiplayer online battle arena (MOBA) video game in which two teams of five players compete to destroy a large structure defended by

SECTION 60

#1732791741190

10044-524: The Warcraft III World Editor and DotA-Allstars, LLC as proper claims to the franchise name. The dispute was settled in May 2012, with Valve retaining commercial rights to the Dota trademark, while allowing non-commercial use of the name by third-parties. In 2017, Valve's ownership of the franchise was again challenged, after a 2004 internet forum post from Eul was brought to light by a Chinese company known as uCool, who had released

10206-450: The game as a service , selling loot boxes and a battle pass subscription system called Dota Plus that offer non-gameplay altering virtual goods in return, such as hero cosmetics and audio replacement packs. The game was ported to the Source 2 engine in 2015, making it the first game to use it. Dota 2 has a large esports scene, with teams from around the world playing in various professional leagues and tournaments. Valve organizes

10368-445: The greatest video games of all time . It has been one of the most played games on Steam since its release, with over a million concurrent players at its peak. The popularity of the game has led to merchandise and media adaptations, including comic books and an anime series, as well as promotional tie-ins to other games and media. The game allows for the community to create their own gamemodes, maps, and cosmetics, which are uploaded to

10530-497: The multi-armed bandit problem and for finite state space Markov decision processes in Burnetas and Katehakis (1997). Reinforcement learning requires clever exploration mechanisms; randomly selecting actions, without reference to an estimated probability distribution, shows poor performance. The case of (small) finite Markov decision processes is relatively well understood. However, due to the lack of algorithms that scale well with

10692-419: The "Dota Store", which offers for-purchase cosmetic virtual goods , such as custom armor and weapons for their heroes. It was announced that the full roster of heroes would be available at launch for free. Until the game's release in 2013, players were able to purchase an early access bundle, which included a digital copy of Dota 2 and several cosmetic items. Included as optional downloadable content (DLC),

10854-1002: The "Rapid" infrastructure. Rapid consists of two layers: it spins up thousands of machines and helps them ‘talk’ to each other and a second layer runs software. By 2018, OpenAI Five had played around 180 years worth of games in reinforcement learning running on 256 GPUs and 128,000 CPU cores, using Proximal Policy Optimization , a policy gradient method . Prior to OpenAI Five, other AI versus human experiments and systems have been successfully used before, such as Jeopardy! with Watson , chess with Deep Blue , and Go with AlphaGo . In comparison with other games that have used AI systems to play against human players, Dota 2 differs as explained below: Long run view : The bots run at 30 frames per second for an average match time of 45 minutes, which results in 80,000 ticks per game. OpenAI Five observes every fourth frame, generating 20,000 moves. By comparison, chess usually ends before 40 moves, while Go ends before 150 moves. Partially observed state of

11016-399: The Ancients players would take up Dota 2 and to promote the game to a new audience, Valve invited sixteen accomplished Defense of the Ancients esports teams to compete at a Dota 2 -specific tournament at Gamescom in August 2011, which later became an annually held event known as The International . From The International 2013 onward, its prize pool began to be crowdfunded through

11178-437: The Ancients , as well as the often hostile community. Peter Bright of Ars Technica also directed criticism at the ability for third-party websites to allow skin gambling and betting on match results, similar to controversies that also existed with Valve's Counter-Strike: Global Offensive . Using Dota 2 as an example, Bright thought that Valve had built gambling elements directly into their games, and had issues with

11340-457: The Bellman equations. This can be effective in palliating this issue. In order to address the fifth issue, function approximation methods are used. Linear function approximation starts with a mapping ϕ {\displaystyle \phi } that assigns a finite-dimensional vector to each state-action pair. Then, the action values of a state-action pair ( s ,

11502-511: The Collector's Aegis again the following year for The International 2016, as well as selling a limited edition Dota 2 themed HTC Vive virtual reality headset during the event. In July 2017, an 18-track music soundtrack was released by Ipecac Recordings , including a version on vinyl. A digital collectible card game based on the universe of Dota 2 and designed by Magic: The Gathering creator Richard Garfield and Valve, Artifact ,

11664-464: The DPC, teams are awarded qualification points for their performance in sponsored tournaments, with the top twelve earning direct invites to that season's International. To avoid conflicting dates with other tournaments, Valve directly manages the scheduling of them. The primary medium for professional Dota 2 coverage is the live streaming platform Twitch . For most major events, tournament coverage

11826-497: The Philippines. Dota 2 received "universal acclaim" according to review aggregator Metacritic , and has been cited as one of the greatest video games of all time . In a preview of the game in 2012, Rich McCormick of PC Gamer thought that Dota 2 was "an unbelievably deep and complex game that offers the purest sequel to the original Defense of the Ancients . Rewarding like few others, but tough." Adam Biessener,

11988-407: The ability of the bots had increased to work together as a full team of five, known as the OpenAI Five , who then played and won against a team of semi-professional players in a demonstration game in August 2018. Shortly after, OpenAI Five then played two live games against more skilled players at The International 2018 . Although the bots lost both games, OpenAI considered it successful as playing

12150-406: The ability to jump onto the field at any time to see the quarterback 's point of view. Chris Thursten of PC Gamer agreed, calling the experience "incredible" and unlike any other esports spectating system that existed prior to it. Sam Machkovech of Ars Technica also praised the addition, believing that the functionality could "attract serious attention from gamers and non-gamers alike". While

12312-881: The absence of a mathematical model of the environment). Basic reinforcement learning is modeled as a Markov decision process : The purpose of reinforcement learning is for the agent to learn an optimal (or near-optimal) policy that maximizes the reward function or other user-provided reinforcement signal that accumulates from immediate rewards. This is similar to processes that appear to occur in animal psychology. For example, biological brains are hardwired to interpret signals such as pain and hunger as negative reinforcements, and interpret pleasure and food intake as positive reinforcements. In some circumstances, animals learn to adopt behaviors that optimize these rewards. This suggests that animals are capable of reinforcement learning. A basic reinforcement learning agent interacts with its environment in discrete time steps. At each time step t ,

12474-406: The agent only has access to a subset of states, or if the observed states are corrupted by noise, the agent is said to have partial observability , and formally the problem must be formulated as a partially observable Markov decision process . In both cases, the set of actions available to the agent can be restricted. For example, the state of an account balance could be restricted to be positive; if

12636-461: The agent receives the current state S t {\displaystyle S_{t}} and reward R t {\displaystyle R_{t}} . It then chooses an action A t {\displaystyle A_{t}} from the set of available actions, which is subsequently sent to the environment. The environment moves to a new state S t + 1 {\displaystyle S_{t+1}} and

12798-403: The agent visiting a particular state and performing a particular action diminishes. Reinforcement learning differs from supervised learning in not needing labelled input-output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with

12960-556: The average Steam Workshop contributor for Dota 2 and Team Fortress 2 made approximately $ 15,000 from their creations in 2013. By 2015, sales of Dota 2 virtual goods had earned Valve over $ 238 million in revenue, according to the digital game market research group SuperData. In 2016, Valve introduced the "Custom Game Pass" option for creators of custom game modes, which allows them to be funded by way of microtransactions by adding exclusive features, content, and other changes to their game mode for players who buy it. Dota 2 includes

13122-442: The batch). Batch methods, such as the least-squares temporal difference method, may use the information in the samples better, while incremental methods are the only choice when batch methods are infeasible due to their high computational or memory complexity. Some methods try to combine the two approaches. Methods based on temporal differences also overcome the fourth issue. Another problem specific to TD comes from their reliance on

13284-420: The best multiplayer game at the 10th British Academy Games Awards in 2014, but lost to Grand Theft Auto V , and was nominated for Esports Game of the Year at The Game Awards at its events from 2015 to 2019, while winning the award for best MOBA at the 2015 Global Game Awards. The game was also nominated for the community created "Love/Hate Relationship" award at the inaugural Steam Awards in 2016. In

13446-590: The best players in Dota 2 allowed them to analyze and adjust their algorithms for future games. The bots' final public demonstration occurred in April 2019, where they won a best-of-three series against The International 2018 champions OG at a live event in San Francisco . A four-day online event to play against the bots, open to the public, occurred the same month. There, the bots played in 42,729 public games, winning 99.4% of those games. Each OpenAI Five bot

13608-413: The bots against some of the best players in Dota 2 allowed them to analyze and adjust their algorithms for future games. In 2019, OpenAI Five was able to defeat the reigning International champion OG . Policy gradient method Reinforcement learning ( RL ) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in

13770-463: The bots would eventually "get there, and sooner than expected". In a conversation with MIT Technology Review , AI experts also considered OpenAI Five system as a significant achievement, as they noted that Dota 2 was an "extremely complicated game", so even beating non-professional players was impressive. PC Gamer wrote that their wins against professional players was a significant event in machine learning. In contrast, Motherboard wrote that

13932-588: The center of each base. Development of Dota 2 began in 2009 when IceFrog , lead designer of Defense of the Ancients , was hired by Valve to design a modernized remake in the Source game engine . It was released for Windows , OS X , and Linux via the digital distribution platform Steam in July 2013, following a Windows-only open beta phase that began two years prior. The game is fully free-to-play with no heroes or any other gameplay element needing to be bought or otherwise unlocked. To maintain it, Valve supports

14094-430: The community, and the full replacement of the original Source engine with Source 2. Largely attributed to technical difficulties players experienced with the update, the global player base experienced a sharp drop of approximately sixteen percent the month following its release. However, after various updates and patches , over a million concurrent players were playing again by the beginning of 2016, with that number being

14256-702: The composers of Warcraft III: Reign of Chaos , Jason Hayes, was hired to collaborate with Tim Larkin to write the original score for the game, which was conducted by Timothy Williams and performed and recorded by the Northwest Sinfonia at Bastyr University . Valve had Half-Life series writer Marc Laidlaw , science fiction author Ted Kosmatka , and other employees contributed hero dialog and lore. Notable voice actors for heroes include Nolan North , Dave Fennoy , Jon St. John , Ellen McLain , Fred Tatasciore , Merle Dandridge , Jen Taylor , and John Patrick Lowrie , among others. The Source engine

14418-451: The core and support. Cores, which are also called carries, begin each match as weak and vulnerable, but are able to become more powerful later in the game, thus becoming able to "carry" their team to victory. Supports generally lack abilities that deal heavy damage, instead having ones with more functionality and utility that provide assistance for their cores, such as providing healing and other buffs . Players select their hero during

14580-448: The cosplay competitions, Valve holds short film contests every year at The International, with winners of the competition also being awarded prize money. In addition, Valve have created free webcomics featuring some of the heroes, further detailing their background lore. A physical collection of the comics was released as Dota 2: The Comic Collection by Dark Horse Comics in August 2017. An anime streaming television series based on

14742-401: The course of the game, up to a maximum level of 30. Whenever a hero gains an experience level, the player is able to unlock another of their abilities or improve one already learned. The most powerful ability for each hero is known as their "ultimate", which requires them to have an experience level of six in order to use. In order to prevent abilities from being overused , a magic system

14904-609: The cumulative Major of their respective seasons, the series had five other events, which were the Frankfurt Major , Shanghai Major , Manila Major , Boston Major , and Kiev Major . Following the International 2017, the Majors were replaced with the Dota Pro Circuit (DPC) format due to criticism by teams and fans for Valve's non-transparent and unpredictable nature for handing out International invitations. In

15066-416: The current value of the state is 3 and the state transition attempts to reduce the value by 4, the transition will not be allowed. When the agent's performance is compared to that of an agent that acts optimally, the difference in performance yields the notion of regret . In order to act near optimally, the agent must reason about long-term consequences of its actions (i.e., maximize future rewards), although

15228-459: The discounted return associated with following π {\displaystyle \pi } from the initial state s {\displaystyle s} . Defining V ∗ ( s ) {\displaystyle V^{*}(s)} as the maximum possible state-value of V π ( s ) {\displaystyle V^{\pi }(s)} , where π {\displaystyle \pi }

15390-547: The editor who authored the announcement article for Dota 2 for Game Informer in 2010, praised Valve for maintaining the same mechanics and game balance that made Defense of the Ancients successful nearly a decade prior and Quintin Smith of Eurogamer described Dota 2 as the "supreme form of the MOBA which everyone else working in the genre is trying to capture like lightning in a bottle". The most frequently praised aspects of

15552-428: The environment’s dynamics, Monte Carlo methods rely solely on actual or simulated experience—sequences of states, actions, and rewards obtained from interaction with an environment. This makes them applicable in situations where the complete dynamics are unknown. Learning from actual experience does not require prior knowledge of the environment and can still lead to optimal behavior. When using simulated experience, only

15714-500: The esport game of the year award from PC Gamer and onGamers . GameTrailers also awarded the game the award for Best PC Game of 2013, with IGN also awarding it the Best PC Strategy & Tactics Game, Best PC Multiplayer Game, and People's Choice Award. Similarly, Game Informer recognized Dota 2 for the categories of Best PC Exclusive, Best Competitive Multiplayer and Best Strategy of 2013. The same year, Dota 2

15876-424: The final restrictions against unlimited global access to Dota 2 were lifted after the game's infrastructure and servers were substantially bolstered. In order to abide by the standards set by the economic legislation of specific countries, Valve opted to contract with nationally based developers for publishing. In October 2012, Chinese game publisher Perfect World announced they had received distribution rights for

16038-417: The game : Players and their allies can only see the map directly around them. The rest of it is covered in a fog of war which hides enemies units and their movements. Thus, playing Dota 2 requires making inferences based on this incomplete data, as well as predicting what their opponent could be doing at the same time. By comparison, Chess and Go are "full-information games", as they do not hide elements from

16200-539: The game and its professional scene was produced by Valve and released in March 2014. Known as Free to Play , the film follows three players during their time at the first International in 2011. American basketball player Jeremy Lin , who was a media sensation at the time , had a guest appearance in the film and called the game "a way of life". Lin later compared the game and the esports scene in general to basketball and other traditional sporting events, saying that there

16362-660: The game in the country. The Chinese client has a region-specific "Low Violence" mode, which removes most depictions of blood, gore, and skulls in order for the game to follow censorship policies of the country . In November 2012, a similar publishing deal was made with the South Korea-based game company Nexon to distribute and market the game in the country, as well as in Japan. Three years later, Nexon announced they would no longer be operating servers for Dota 2 , with Valve taking over direct distribution and marketing of

16524-419: The game in those regions. In December 2016, Dota 2 was updated to gameplay version 7.00, also known as "The New Journey" update. Prior to the update, the Dota series had been in version 6. xx for over a decade, marking the first major revision since IceFrog originally took over development of the original mod in the mid-2000s. The New Journey update added and changed numerous features and mechanics of

16686-465: The game was originally meant to publicly release in 2012, Valve later scrapped that plan as it would have kept the game in its closed beta state for over a year. Owing to that, Valve lifted the non-disclosure agreement and transitioned the game into open beta in September 2011, allowing players to discuss the game and their experiences publicly. Following nearly two years of beta testing, Dota 2

16848-725: The game were its depth, delivery, and overall balance. Chris Thursten of PC Gamer described the gameplay as being "deep and rewarding". Martin Gaston of GameSpot complimented Valve for the artistic design and delivery of Dota 2 , citing the execution of the user interface design, voice acting, and characterization as exceeding those of the game's competitors. Phill Cameron of IGN and James Kozanitis of Hardcore Gamer both praised Dota 2 for its free-to-play business model including only cosmetic items in contrast to other games such as League of Legends , which charges to unlock better "heroes" to play as, with Kozanitis stating that Dota 2

17010-487: The game would build upon the mod without making significant changes to its core. Valve hired Eul and contracted major contributors from the Defense of the Ancients community, including artist Kendrick Lim, to assist with the sequel. Additional contributions from sources outside of Valve were also sought regularly for Dota 2 , as to continue Defense of the Ancients ' tradition of community-sourced development. One of

17172-496: The game's default soundtrack, such as electronic music artist deadmau5 and Singaporean songwriter JJ Lin . To coincide with the Windows release of Square Enix 's Final Fantasy Type-0 HD in August 2015, a bundle containing a custom loading screen, a Moogle ward, and a Chocobo courier were added the same month. In April 2016, Valve announced a cross-promotional workshop contest for Sega 's Total War: Warhammer , with

17334-546: The game, Dota: Dragon's Blood , premiered on Netflix in March 2021. The series was produced by Studio Mir and Kaiju Boulevard . Dota 2 has been used in machine learning experiments, with the American artificial intelligence research company OpenAI curating a system, known as the OpenAI Five , that allows bots to learn how to play the game at a high skill level entirely through trial-and-error algorithms. The bots learn over time by playing against itself hundreds

17496-413: The game, including adding the first original hero not ported over from Defense of the Ancients , a reworked map, a redesigned HUD , a pre-game phase that allows for players to discuss their team strategy, and a "Talent Tree" ability augmentation system. In April 2017, Valve announced changes to the game's ranked matchmaking system, with the main one requiring the registration of a unique phone number to

17658-407: The game, players are able to buy items from set locations on the map called shops that provide their own special abilities. Items are not limited to specific heroes, and can be bought by anyone. In order to obtain an item, players must be able to afford it with gold at shops located on the map, which is primarily obtained by killing enemy heroes, destroying enemy structures, and killing creeps, with

17820-455: The goal of maximizing the cumulative reward (the feedback of which might be incomplete or delayed). The search for this balance is known as the exploration-exploitation dilemma . The environment is typically stated in the form of a Markov decision process (MDP), as many reinforcement learning algorithms use dynamic programming techniques. The main difference between classical dynamic programming methods and reinforcement learning algorithms

17982-400: The heroes and other elements from the game. In addition, Valve secured licensing contracts with third-party producers; the first of these deals concerned a Dota 2 themed SteelSeries mousepad , which was announced alongside the game at Gamescom 2011. In September 2012, Wētā Workshop , the prop studio that creates the "Aegis of Champions" trophy for winners of The International, announced

18144-415: The highest action-value at each state, s {\displaystyle s} . The action-value function of such an optimal policy ( Q π ∗ {\displaystyle Q^{\pi ^{*}}} ) is called the optimal action-value function and is commonly denoted by Q ∗ {\displaystyle Q^{*}} . In summary, the knowledge of

18306-598: The highest earning esport game by a margin of nearly $ 60 million at the time. From late 2015 until early 2017, Valve sponsored a series of smaller-scale, seasonally held tournaments known as the Dota Major Championships, which all had fixed prize pools of US$ 3 million. Their format was based on the tournament series of the same name that Valve also sponsored for their first-person shooter game Counter-Strike: Global Offensive . Including The International 2016 and 2017 , which were considered to be

18468-463: The highest possible medal rank are listed by Valve on an online leaderboard, separated into North American, European, Southeast Asian, and Chinese regions. The game includes a report system, which allows players to punish player behavior that intentionally provides a negative experience . Players who get reported enough or leave several games before they conclude, a practice known as "abandoning", are placed into low priority matchmaking, which remains on

18630-681: The highest-paying esport games, second to StarCraft II . At E3 2013 , South Korean company Nexon announced the investment of ₩ 2 billion (approximately US$ 1.7 million) into local leagues in the country, which coincided with their distribution partnership with Valve for the game. In February 2015, Valve sponsored and held the Dota 2 Asia Championships in Shanghai , with a prize pool of over $ 3 million raised through compendium sales. In total, professional Dota 2 tournaments had earned teams and players over $ 100 million by June 2017, with over half of that being awarded at Internationals, making it

18792-454: The immediate reward associated with this might be negative. Thus, reinforcement learning is particularly well-suited to problems that include a long-term versus short-term reward trade-off. It has been applied successfully to various problems, including energy storage , robot control , photovoltaic generators , backgammon , checkers , Go ( AlphaGo ), and autonomous driving systems . Two elements make reinforcement learning powerful:

18954-477: The lanes and attempt to attack any opposing heroes, creeps, and buildings in their way. Creeps periodically spawn throughout the game in groups from two buildings, called the "barracks", that exist in each lane and are located within the team's bases. The map is permanently covered for both teams in fog of war , which prevents a team from seeing the opposing team's heroes and creeps if they are not directly in sight of themselves or an allied unit. The map features

19116-525: The largest in nearly a year. The move to Source 2 allowed the use of the Vulkan graphics API, released as an optional feature in May 2016, making Dota 2 one of the first games to use it. Dota 2 was made available to the public at Gamescom in 2011, coinciding with the inaugural International championship, the game's premier esport tournament event. At the event, Valve began sending out closed beta invitations to DotA players and attendees. Although

19278-532: The late 2010s, the game was nominated for Choice Video Game at the 2017 Teen Choice Awards , for Esports Game of the Year at the Golden Joystick Awards , and as IGN 's best spectator game. A month prior to its launch, Dota 2 was already the most played game on Steam with a concurrent player count of nearly 330,000, which outweighed the number of players for the rest of platform's top ten most-played games combined. It remained as

19440-499: The latter being an act called " farming ". Only the hero that lands the killing blow on a creep obtains gold from it, an act called " last hitting ", but all allies receive a share of gold when an enemy hero dies close to them. Players are able to "deny" allied units and structures by last hitting them, which then prevents their opponents from getting full experience from them. Gold can not be shared between teammates, with each player having their own independent stash. Players receive

19602-461: The learning curve and players' attitudes as unwelcoming. Benjamin Danneberg of GameStar alluded to the learning curve as a "learning cliff", calling the newcomer's experience painful, with the tutorial feature new to the Dota franchise only being partially successful. In a review for Metro , Dota 2 was criticized for not compensating for the flaws with the learning curve from Defense of

19764-511: The majority of reviewers gave Dota 2 highly positive reviews, a common criticism was that the game maintains a steep learning curve that requires exceptional commitment to overcome. While providing a moderately positive review that praised Valve's product stability, Fredrik Åslund from the Swedish division of Gamereactor described his first match of Dota 2 as one of the most humiliating and inhospitable experiences of his gaming career, citing

19926-409: The map. Each of the ten players independently controls a character known as a hero that has unique abilities and differing styles of play. During a match, players collect experience points (XP) and items for their heroes to defeat the opposing team's heroes in player versus player (PvP) combat. A team wins by being the first to destroy the other team's Ancient, a large durable structure located in

20088-552: The mod's developers, the Chinese-based Drodo Studio, to discuss directly collaborating on a standalone version. However, the two companies were unable to come to an agreement, with them both stating that it was in their best interest to develop their own separate games. Valve's version, Dota Underlords , was released in February 2020 and continued to use the Dota setting, while Drodo's game, Auto Chess ,

20250-453: The most played game by concurrent players on the platform for four years, having a peak of over one million and never dropping below first place for any extended period of time until being surpassed by PlayerUnknown's Battlegrounds in 2017. Viewership and followings of professional Dota 2 leagues and tournaments are popular, with peak viewership numbers of the International in the millions. Some Asian schools and universities, such as

20412-511: The number of states (or scale to problems with infinite state spaces), simple exploration methods are the most practical. One such method is ε {\displaystyle \varepsilon } -greedy, where 0 < ε < 1 {\displaystyle 0<\varepsilon <1} is a parameter controlling the amount of exploration vs. exploitation. With probability 1 − ε {\displaystyle 1-\varepsilon } , exploitation

20574-493: The ongoing matches similar to traditional sporting events. In addition to playing live to audiences in arenas and stadiums, broadcasts of them are also streamed over the internet and sometimes simulcast on television, with several million in viewership numbers. Despite criticism going towards its steep learning curve and overall complexity, Dota 2 was praised for its rewarding gameplay, production quality, and faithfulness to its predecessor, with many considering it to be one of

20736-454: The operations research and control literature, RL is called approximate dynamic programming , or neuro-dynamic programming. The problems of interest in RL have also been studied in the theory of optimal control , which is concerned mostly with the existence and characterization of optimal solutions, and algorithms for their exact computation, and less with learning or approximation (particularly in

20898-429: The opposing player. Continuous action space : Each playable character in a Dota 2 game, known as a hero, can take dozens of actions that target either another unit or a position. The OpenAI Five developers allow the space into 170,000 possible actions per hero. Without counting the perpetual aspects of the game, there are an average of ~1,000 valid actions each tick. By comparison, the average number of actions in chess

21060-522: The opposing team known as the "Ancient" whilst defending their own. As in Defense of the Ancients , the game is controlled using standard real-time strategy controls , and is presented on a single map in a three-dimensional isometric perspective . Ten players each control one of the game's 126 playable characters , known as "heroes", with each having their own design, strengths, and weaknesses. Heroes are divided into two primary roles , known as

21222-625: The optimal action-value function alone suffices to know how to act optimally. Assuming full knowledge of the Markov decision process, the two basic approaches to compute the optimal action-value function are value iteration and policy iteration . Both algorithms compute a sequence of functions Q k {\displaystyle Q_{k}} ( k = 0 , 1 , 2 , … {\displaystyle k=0,1,2,\ldots } ) that converge to Q ∗ {\displaystyle Q^{*}} . Computing these functions involves computing expectations over

21384-435: The player on how to play their hero. In June 2015, Valve announced that Dota 2 would be ported over to their Source 2 game engine in an update called Dota 2 Reborn . Reborn was released as an opt-in beta update that same month, and replaced the original client in September 2015, making it the first game to use the engine. Reborn included a new user interface framework design, ability for custom game modes created by

21546-606: The prediction problem and then extending to policy improvement and control, all based on sampled experience. The first problem is corrected by allowing the procedure to change the policy (at some or all states) before the values settle. This too may be problematic as it might prevent convergence. Most current algorithms do this, giving rise to the class of generalized policy iteration algorithms. Many actor-critic methods belong to this category. The second issue can be corrected by allowing trajectories to contribute to any state-action pair in them. This may also help to some extent with

21708-402: The purchase of tickets from the "Dota Store", which give players in-game access to matches. Ticket fees are apportioned in part to tournament organizers. The game features an in-game fantasy sports system, which is modeled after traditional fantasy sports and feature professional Dota 2 players and teams. Players are able to spectate games in virtual reality (VR) with up to 15 others, which

21870-407: The recursive Bellman equation. Most TD methods have a so-called λ {\displaystyle \lambda } parameter ( 0 ≤ λ ≤ 1 ) {\displaystyle (0\leq \lambda \leq 1)} that can continuously interpolate between Monte Carlo methods that do not rely on the Bellman equations and the basic TD methods that rely entirely on

22032-550: The returns of subsequent states within the same episode, making the problem non-stationary . To address this non-stationarity, Monte Carlo methods use the framework of general policy iteration (GPI). While dynamic programming computes value functions using full knowledge of the Markov decision process (MDP), Monte Carlo methods learn these functions through sample returns. The value functions and policies interact similarly to dynamic programming to achieve optimality , first addressing

22194-577: The reward R t + 1 {\displaystyle R_{t+1}} associated with the transition ( S t , A t , S t + 1 ) {\displaystyle (S_{t},A_{t},S_{t+1})} is determined. The goal of a reinforcement learning agent is to learn a policy : π : S × A → [ 0 , 1 ] {\displaystyle \pi :{\mathcal {S}}\times {\mathcal {A}}\rightarrow [0,1]} , π ( s ,

22356-512: The same time, the game introduced the "Dota Plus" monthly subscription system that replaced the seasonal battle passes that were released to coincide with a Major tournament. In addition to offering everything battle passes previously did, Dota Plus added features such as a hero-specific achievement system that reward players who complete them with exclusive cosmetics, as well as providing hero and game analytics and statistics gathered from thousands of recent games. To ensure that enough Defense of

22518-449: The third problem, although a better solution when returns have high variance is Sutton's temporal difference (TD) methods that are based on the recursive Bellman equation . The computation in TD methods can be incremental (when after each transition the memory is changed and the transition is thrown away), or batch (when the transitions are batched and the estimates are computed once based on

22680-452: The third-party sites as the practice was not allowed by their user agreements or API . Comparisons of Dota 2 to other MOBA games are commonplace, with the mechanics and business model compared to League of Legends and Heroes of the Storm . Contrasting it with League of Legends , T. J. Hafer of PC Gamer called Dota 2 the "superior experience", stating that he thought the game

22842-402: The time to master it. Further comparing it to Heroes of Newerth , players from the professional Dota 2 team OG said that most Heroes of Newerth players were able to transition over easily to the game, due to the strong similarities that both games share. Similar to other highly competitive online games , Dota 2 is often considered to have a hostile and " toxic " community. In 2019,

23004-496: The time. From 2015 until 2021, each iteration of the International surpassed the previous one's prize pool, with The International 2021 peaking at $ 40 million. During its beta phase in the early 2010s several other esport events would begin hosting Dota 2 events, including the Electronic Sports World Cup , DreamHack , World Cyber Games , and ESL . By the end of 2011, Dota 2 was already one of

23166-401: The unit. In addition, action heads are computed independently. The AI system observes the world as a list of 20,000 numbers and takes an action by conducting a list of eight enumeration values. Also, it selects different actions and targets to understand how to encode every action and observe the world. OpenAI Five has been developed as a general-purpose reinforcement learning training system on

23328-446: The unregulated practice, which he said was often used by underage players and regions where gambling is illegal . Australian senator Nick Xenophon had similar sentiment, stating that he wanted to introduce legislation in his country to minimize underage access to gambling within video games, including Dota 2 . In response to the controversy, Valve and Dota 2 project manager Erik Johnson stated that they would be taking action against

23490-586: The use of samples to optimize performance, and the use of function approximation to deal with large environments. Thanks to these two key components, RL can be used in large environments in the following situations: The first two of these problems could be considered planning problems (since some form of model is available), while the last one could be considered to be a genuine learning problem. However, reinforcement learning converts both planning problems to machine learning problems. The exploration vs. exploitation trade-off has been most thoroughly studied through

23652-596: The value function estimates "how good" it is to be in a given state. where the random variable G {\displaystyle G} denotes the discounted return , and is defined as the sum of future discounted rewards: where R t + 1 {\displaystyle R_{t+1}} is the reward for transitioning from state S t {\displaystyle S_{t}} to S t + 1 {\displaystyle S_{t+1}} , 0 ≤ γ < 1 {\displaystyle 0\leq \gamma <1}

23814-469: The victory was "basically cheating" due to the simplified hero pools on both sides, as well as the fact that bots were given direct access to the API, as opposed to using computer vision to interpret pixels on the screen. The Verge wrote that the bots were evidence that the company's approach to reinforcement learning and its general philosophy about AI was "yielding milestones". In 2019, DeepMind unveiled

23976-485: The video game StarCraft II , AlphaGo in the board game Go , Deep Blue in chess , and Watson on the television game show Jeopardy! . Development on the algorithms used for the bots began in November 2016. OpenAI decided to use Dota 2 , a competitive five-on-five video game, as a base due to it being popular on the live streaming platform Twitch , having native support for Linux , and had an application programming interface (API) available. Before becoming

24138-472: The whole state-space, which is impractical for all but the smallest (finite) Markov decision processes. In reinforcement learning methods, expectations are approximated by averaging over samples and using function approximation techniques to cope with the need to represent value functions over large state-action spaces. Monte Carlo methods are used to solve reinforcement learning problems by averaging sample returns. Unlike methods that require full knowledge of

24300-587: The winning entries being included in the game later that year. In 2017, a cosmetic set based on the Companion Cube from the Portal series was released as part of that year's International battle pass for the hero known as "Io". In December of the same year, the character Amaterasu from Capcom 's Ōkami was included as a courier for those who had pre-ordered the PC release of the game. A documentary on

24462-521: Was "all about counterplay", with most of the heroes being designed to directly counter another. Hafer also preferred the way the game handled its hero selection pool, with all of them being unlocked right from the start, unlike in League of Legends . Comparing Dota 2 to Heroes of the Storm , Jason Parker of CNET said that while Heroes of the Storm was easier to get into, the complexities and depth of Dota 2 would be appreciated more by those who put in

24624-502: Was "the only game to do free-to-play right". Nick Kolan of IGN agreed, comparing the game's business model to Valve's Team Fortress 2 , which uses a nearly identical system. Post-release additions to the game were praised, such as the addition of virtual reality (VR) support in 2016. Ben Kuchera of Polygon thought that spectating games in VR was "amazing", comparing it to being able to watch an American football game on television with

24786-407: Was a step in the direction of creating software that can handle complex tasks "like being a surgeon". OpenAI used a methodology called reinforcement learning , as the bots learn over time by playing against itself hundreds of times a day for months, in which they are rewarded for actions such as killing an enemy and destroying towers. By June 2018, the ability of the bots expanded to play together as

24948-412: Was added in an update in July 2016. The update added a hero showcase mode, which allows players to see all of the heroes and their cosmetics full-size in virtual reality. As part of a plan to develop Dota 2 into a social network , Newell announced in April 2012 that the game would be free-to-play , and that community contributions would be a cornerstone feature. Instead, revenue is generated through

25110-449: Was developed using no Dota 2 assets. The Dota series began in 2003 with Defense of the Ancients ( DotA )—a mod for Blizzard Entertainment 's Warcraft III: Reign of Chaos —created by the pseudonymous designer "Eul". An expansion pack for Warcraft III , The Frozen Throne , was released later that year; a series of Defense of the Ancients clone mods for the new game competed for popularity. DotA: Allstars by Steve Feak

25272-510: Was nominated for several awards by Destructoid . While the staff selected StarCraft II: Heart of the Swarm , Dota 2 received the majority of the votes distributed between the nine nominees. During the 17th Annual D.I.C.E. Awards , the Academy of Interactive Arts & Sciences nominated Dota 2 for " Role-Playing/Massively Multiplayer Game of the Year ". Dota 2 was later nominated for

25434-413: Was not much of a difference between the two, while comparing various NBA all-stars , such as Stephen Curry , Kobe Bryant , and LeBron James , to different heroes in the game. Starting in 2016, Valve began producing an episodic documentary series titled True Sight , a spiritual successor to Free to Play . The first three episodes followed the professional teams Evil Geniuses and Fnatic during

25596-485: Was released in November 2018. Promotional tie-ins to other video games and media have been added to Dota 2 , including custom Half-Life 2 , Bastion , Portal , The Stanley Parable , Rick and Morty , Fallout 4 , Deus Ex: Mankind Divided , and Darkest Dungeon announcer packs, which replace the game's default announcer with ones based on those franchises. In addition to announcer packs, musical artists have written music packs that can replace

25758-488: Was released on Steam for Windows on July 9, 2013, and for OS X and Linux on July 18, 2013. The game did not launch with every hero from Defense of the Ancients . Instead, the missing ones were added in various post-release updates, with the final one, as well as the first Dota 2 original hero, being added in 2016. Two months following the game's release, Newell claimed that updates to Dota 2 generated up to three percent of global internet traffic. In December 2013,

25920-506: Was showcased, allowing participants to interact with various items and objects from the game in VR. The demo, known as Secret Shop , was later included the following year on The Lab , Valve's virtual reality compilation game. After the conclusion of The International 2015, Valve awarded the Collector's Aegis of Champions, a brass replica of the Aegis of Champions award trophy, to those with compendiums of 1,000 levels or more. Valve awarded

26082-407: Was the most successful, and Feak, with his friend Steve Mescon, created the official Defense of the Ancients community website and the holding company DotA-Allstars, LLC. When Feak retired from DotA: Allstars in 2005, a fellow website user under the pseudonym IceFrog , became its lead designer. By the late 2000s, Defense of the Ancients became one of the most popular mods worldwide, as well as

26244-607: Was updated with new features to accommodate Dota 2 , such as high-end cloth modeling and improved global lighting. The game features Steam integration, which provides its social component and cloud storage for personal settings. In November 2013, Valve introduced a coaching system that allows experienced players to tutor newer players with in-game tools. As with previous Valve multiplayer games, players are able to spectate live matches of Dota 2 played by others, and local area network (LAN) multiplayer support allows for local competitions. Some of these events may be spectated via

#189810