Application of Fuzzy-Enhanced Reinforcement Learning in Artificial Intelligence for Elevator Group Control

May 19, 2025|Communication Systems - Control Systems|Elevator World

Elevator World
May 19, 2025

by Meysam Talebi

Abstract

Recent advancements in reinforcement learning (RL) algorithms and their theoretical underpinnings have spurred significant interest within the computational intelligence community. RL algorithms, leveraging iterative approximations of dynamic programming, are capable of acquiring knowledge through both empirical and simulated interactions. By prioritizing computational resources on frequently accessed regions of the state space during control operations, these algorithms facilitate the solution of complex, large-scale problems. When applied within a multi-agent framework, where each agent employs such algorithms, a cooperative learning paradigm emerges, benefiting the entire team. This study demonstrates the effectiveness of collective RL algorithms in addressing intricate control problems. Elevator group control is utilized as an experimental platform, presenting a unique set of challenges not typically encountered in multi-agent learning research. A cohort of RL agents, each tasked with managing an individual elevator, is implemented. These agents receive a shared reward signal, which is perceived as noisy due to the interdependence of agent actions, the inherent stochasticity of passenger arrivals, and the limitations of state observability. To enhance therobustness of the Q-learning process in thisdynamicenvironment, a fuzzy logic model isintegrated for the computation of Q-values, enabling the agents to handle uncertaintiesandmake more informed decisions. Despite these complexities, simulation results indicate performance superior to existing state-of-the-art elevator control methods. These findings underscore the potential of multi-agent RL in tackling large-scale, stochastic dynamic optimization problems pertinent to real-world applications.

Keywords:reinforcement learning,elevator group control, fuzzy control, specialized learning, reverse learning, high-speed elevator.

Introduction

A common misconception held by the public equates artificial intelligence (AI) with emotionless machines, solely designed for task automation and the replacement of human labor. This perception, largely shaped by science fiction portrayals, deviates significantly from the reality of AI. In essence, AI refers to technologies that simulate cognitive functions. While these simulations aim to replicate human thought processes, they exhibit fundamental differences. Although the full realization of AI as conceptually envisioned remains an ongoing pursuit, its pervasive influence on daily life is undeniable. Many routine activities, such as internet searches and social media interactions, are intrinsically linked to AI, often without conscious awareness. This unconscious engagement highlights a gap in public understanding regarding the true nature and capabilities of AI. Given the anticipated prominence of AI in future technological landscapes, a proactive shift from apprehension to comprehension is essential. This necessitates a focus on AI’s potential applications and an expansion of our knowledge base. Thus, a critical initial step involves clarifying the fundamental definition of AI ^{[14, 17]}.

Definition of Artificial Intelligence:

Artificial intelligence (AI) is a branch of computer science dedicated to the creation of intelligent machines capable of performing tasks that traditionally require human intelligence. Essentially, AI¹involves the simulation of human cognitive abilities within computer systems, aiming to develop machines programmed to emulate human thought processes and behaviors. This definition encompasses any machine designed to function akin to the human mind, possessing the capacity for problem-solving and learning ^{[14, 16, 17]}.

Objectives of Artificial Intelligence:

The fundamental objective of AI is to delineate human intelligence and its operational mechanisms in a manner that facilitates machine execution and accurate task completion. The core principles of AI are rooted in learning, reasoning, and perception. AI is a broad discipline within computer science focused on constructing intelligent machines capable of executing tasks typically requiring human cognitive abilities. AI is an interdisciplinary science employing diverse approaches, with advancements in machine learning and deep learning catalyzing paradigm shifts across the technology industry ^{[13, 14, 16]}.

History of Artificial Intelligence:

The genesis of AI can be traced back to the era of World War II, during which German forces utilized the Enigma machine for secure message encryption. In response, British scientist Alan Turing endeavored to decipher these codes. Turing and his team developed the Bombe machine, which successfully decrypted Enigma messages. Both the Enigma and Bombe machines laid the groundwork for machine learning, a subfield of AI. Turing postulated that an intelligent machine should be capable of engaging in communication that does not betray its non-human nature, thereby establishing the foundation for AI—the creation of machines that replicate human thought, decision-making, and action. Progress in technology and hardware facilitated the development of intelligent tools and services incorporating AI. The proliferation of search engines, satellites, and other technologies exemplifies AI’s integration into various processes. The advent of smartphones and smart gadgets further propelled AI into everyday human life, enhancing its practical relevance and fostering broader public awareness of its applications ^{[15, 16, 17]}.

Artificial Intelligence vs. Programming:

In conventional programming, we operate with defined inputs and employ conditional statements, such as if and else, to solve equations and achieve desired outcomes. However, problems addressed by artificial intelligence (AI) exhibit a diverse range of inputs, rendering traditional programming methods inadequate. This is exemplified by systems for speech-to-text conversion or facial recognition, where input data is highly variable, necessitating the utilization of AI models ^[13].

Branches of Artificial Intelligence:

AI encompasses a broad spectrum of disciplines, including:

Expert Systems

Robotics

Machine Learning

Neural Networks

Fuzzy Logic

Natural Language Processing

Levels of Artificial Intelligence:

AI systems are categorized into three levels based on their perception and response to the external environment: limited AI, general AI, and super AI.

Limited AI:

Limited AI systems excel in specific tasks, such as chess gameplay, business decision-making, and speech-to-text conversion.

General AI:

General AI, currently theoretical, aims to replicate human cognitive abilities.

Super AI:

Super AI, surpassing human intelligence, remains a hypothetical concept.

How Artificial Intelligence Learns:

AI systems learn through machine learning and deep learning.

Machine Learning:

Machine learning enables systems to learn from data without explicit programming.

Deep Learning:

Deep learning, a subset of machine learning, mimics human brain processes through neural networks.

Categorization of Artificial Intelligence Systems:

AI systems are classified into four categories:

Reactive Machines

Limited Memory

Theory of Mind

Self-Awareness

Application of Artificial Intelligence in the Elevator Industry:

AI applications in elevators include:

Elevator group control

Predictive maintenance

Enhanced user experience ^{[2, 3, 4]}.

Introduction to Reinforcement Learning:

Machine learning research has predominantly focused on supervised learning, where a “teacher” provides labeled training examples in the form of input-output pairs. Supervised learning algorithms are applicable to a wide range of problems, including pattern classification and function approximation. However, many real-world scenarios involve costly or unattainable labeled training data. Reinforcement learning (RL) addresses these challenges by utilizing training information from a “critic” that offers a scalar evaluation of the chosen output, rather than specifying the optimal output or direction for modification. RL introduces the additional complexity of exploration, which involves determining the optimal output for a given input ^[5].

Types of Reinforcement Learning Tasks:

It is beneficial to differentiate between two types of RL tasks: episodic (non-continuous) and continuing (continuous). In episodic tasks, agents learn mappings from situations to actions that maximize the expected immediate reward. In continuing tasks, agents learn mappings that maximize the expected long-term rewards. Continuing tasks are generally more challenging because an agent’s actions can influence future situations and rewards. In these tasks, agents interact with their environments for extended periods and must evaluate decisions based on long-term consequences ^[5].

Reinforcement Learning and Optimal Control:

From a control theory perspective, RL algorithms provide methods for approximating optimal solutions to stochastic optimal control problems. The agent acts as the controller, and the environment functions as the system to be controlled. The objective is to maximize a specific performance criterion over time. With knowledge of the environment’s state transition probabilities and reward structure, these problems can be solved using dynamic programming (DP) algorithms. However, the computational complexity of DP renders it impractical for problems with a large number of states. Recent RL algorithms are designed to perform DP incrementally, eliminating the need for prior knowledge of state transition probabilities and reward structures. Online learning concentrates computations on frequently visited regions of the state space. Therefore, combining RL with appropriate function approximation methods offers computationally feasible approaches to approximate solutions for large-scale stochastic optimal control problems.

The same concentration phenomenon can be achieved using simulated online learning. A simulation model can often be constructed without explicitly deriving state transition probabilities and reward structures. Utilizing an accurate simulation model offers advantages, such as generating vast amounts of simulated experience and potentially accelerating the learning process. Additionally, there is no need to be concerned about the performance level of a simulated system during learning. A successful example of simulated online RL is the TD-Gammon system, which learned to play backgammon at a master level ^[1,5].

Multi-Agent Reinforcement Learning:

Research on multi-agent RL dates back to the work of Russian mathematician Tsetlin. Theoretical results have been obtained for episodic RL tasks. Certain types of learning automata converge to an equilibrium point in repeated zero-sum and non-zero-sum games. However, in more general non-zero-sum games, equilibrium points often yield poor rewards for all players. A notable example is the prisoner’s dilemma, where the only equilibrium point results in the lowest total reward ^{[1, 7]}.

Introduction to Elevator Group Control:

This section introduces the elevator group control problem, serving as our testbed for multi-agent reinforcement learning. While familiar to anyone who has used an elevator system, this problem, despite its conceptual simplicity, presents significant challenges. An optimal policy for elevator group control remains elusive, necessitating the use of existing control algorithms as benchmarks. The elevator domain provides an opportunity to compare parallel and distributed control architectures, where each agent controls an elevator car, and to monitor performance degradation as agents face diminishing levels of information ^{[1, 5]}.

Elevator Group Control System Schematic:

Figure 1 provides a schematic representation of an elevator system (Lewis, 1991). Elevator cars are depicted as filled squares. The “+” symbol indicates a request for entry into a car from a floor on either side of the shaft, while the “−” symbol denotes a request for a passenger to exit a car and proceed to a specific floor. The left shaft represents cars and requests moving upwards, and the right shaft corresponds to cars and requests moving downwards. Thus, cars circulate clockwise around the shafts. This study examines passenger arrival patterns, elevator control strategies, and the specific simulated elevator system ^[1].

Algorithm and Network Architecture:

This section describes the multi-agent reinforcement learning (MARL) algorithm implemented for elevator group control. Each agent is responsible for controlling an elevator car. The environmental reward structure for each agent is defined based on passenger waiting times, with a focus on minimizing average waiting time. Each agent employs a modified version of Q-learning (Watkins, 1989) for discrete-event systems. Collectively, they implement a form of collective reinforcement learning ^[1].

Discrete-Event Considerations in Reinforcement Learning:

Elevator systems can be modeled as discrete-event systems, where significant events (e.g., passenger arrivals) occur at discrete times, but the time intervals between events are real-valued variables. In such systems, a fixed discount factor γ, as used in most discrete-time RL algorithms, is inadequate. This can be addressed by using a variable discount factor that depends on the time interval between events. In this context, the cost-to-go is defined as an integral rather than a discrete sum, as follows:

Conversion of discrete-time events to continuous-time

(1)

In the above formula, we have:

C_t: the immediate cost at discrete time t,

C_T: the instantaneous cost at continuous time,

_T: the sum of the squared waiting times of all waiting passengers,

β: the controller for the exponential decay rate (β=0.01),

10^6: a scaling factor used to prevent excessive values for the cost until it reaches a threshold.

The instantaneous cost C_Trepresents the dissatisfaction caused by the waiting times of passengers in continuous time. To avoid excessively large values, it is scaled down by a factor of 10^6. The elevator system events occur randomly in continuous time, and the required parameters are effectively infinite, leading to complexity in prediction algorithms. Therefore, we use the discrete-event version of the Q-learning algorithm, as it considers only the actual events of the system and does not require explicit knowledge of state transition probabilities. Bradtke & Duff (1995) extended Watkins’ (1989) Q-learning update rule to the following discrete-event form.

(2)

“Where action ‘a’ is taken from state ‘x’ at time ‘t_x’, the subsequent decision from state ‘y’ at time ‘t_y’ is required. Here, α represents the step-size parameter, and C_T , β are defined as above. The quantity e^(-β(t_y-t_x)) acts as a discount factor, dependent on the time elapsed between events. Bradford and Retch (1995) considered a scenario where C_Tremains constant between events. We extend their formulation to the case where C_Tis quadratic, as the objective is to minimize the squared waiting times. Consequently, the integral in the Q-learning update rule takes the following form:”

(3)

“Where ω_p represents the amount of time each passenger p has waited from time t_x to time t_y (special attention must be paid to any passengers who begin or end their wait between t_xand t_y ). By solving the integral above, we will have:”

(4)

The practical implementation of this formulation is challenged by the requirement for comprehensive knowledge of all waiting passengers’ durations. In real-world elevator systems, however, only the waiting times of individuals who have activated hall call buttons are readily accessible. The arrival times and subsequent waiting periods of future passengers remain unknown. To address this limitation, we investigate two distinct methodological approaches: the ‘omniscient’ and the ‘on-line’ reinforcement learning paradigms.

The simulation environment, by design, possesses complete information regarding all passenger waiting times, enabling the generation of necessary reinforcement signals. This approach, termed the ‘omniscient reinforcement scheme,’ relies on data that is inherently unavailable in operational elevator systems. It is crucial to emphasize that this supplementary information is utilized solely by the evaluation component (the ‘critic’) and not the control mechanism itself. Consequently, a controller trained using this omniscient scheme in a simulated environment can be deployed in a real-world setting without requiring access to any additional data.

Alternatively, the ‘on-line reinforcement scheme’ facilitates learning based exclusively on data obtainable in real-time within an operational system. This methodology assumes the availability of the waiting time for the first passenger in each queue, corresponding to the elapsed button press duration. If the Poisson arrival rate, denoted as λ, for each queue is known or can be reliably estimated, the Gamma distribution can be employed to infer the arrival times of subsequent passengers. The time until the nth subsequent arrival adheres to the Gamma distribution Γ(n, 1/λ). For each queue, the anticipated costs generated by subsequent arrivals within the initial ‘b’ seconds following the hall button activation can be determined as follows:

(5)

This integral can also be solved using integration by parts to compute the expected cost function, a method that has not been employed in this discussion ^{[1, 8, 9]}.

Collective Discrete-Event Q-Learning

In the context of elevator systems, events can be categorized into two primary types. The first category encompasses events crucial for the computation of waiting times, which subsequently play a pivotal role in determining the reinforcement signal employed by the Reinforcement Learning (RL) algorithm. These events consist of passenger arrivals and transfers into and out of elevator cars in the omniscient scenario, or hall button activations in the on-line scenario. The second category comprises car arrival events, which represent potential decision-making junctures for the RL agents responsible for controlling each elevator car. When a car is in transit between floors, it generates a car arrival event upon reaching the point where it must decide whether to stop at or bypass the subsequent floor. In certain situations, elevator cars are constrained to perform specific actions, such as stopping at the next floor to facilitate passenger egress. An agent encounters a decision point only when it possesses an unconstrained selection of actions ^{[11, 12]}.

Computation of Omniscient Reinforcements

Within the framework of the omniscient reinforcement scheme, the cumulative cost undergoes incremental updates following each passenger arrival event (when a passenger joins a queue), passenger transfer event (when a passenger boards or disembarks an elevator car), and car arrival event (when a control decision is executed). These incremental updates provide a pragmatic approach to managing the discontinuities in cost that arise when passengers initiate or conclude their waiting periods between car decisions, such as when an alternate car serves waiting passengers.

The magnitude of cost accrued between successive events remains uniform across all elevator cars, reflecting their shared objective function. However, the cost accumulation for each car between its discrete decision points varies due to the asynchronous nature of their decision-making processes. Consequently, each car ‘i’ is assigned a dedicated storage location, denoted as R[i], where the aggregate discounted cost incurred since its most recent decision (at time d[i]) is accumulated.

At the onset of each event, the subsequent calculations are performed: Let t0 represent the time of the preceding event and t1 denote the time of the current event. For every passenger ‘p’ who has been waiting within the interval [t0, t1], let w0(p) and w1(p) signify the total waiting time for passenger ‘p’ at t0 and t1, respectively. Subsequently, for each car ‘i’ ^{[1, 11, 12]}.

(6)

On-Line Reinforcement Computation

In the on-line reinforcement scheme, the accumulated cost is updated incrementally following each “hall call button press” event (indicating the arrival of the first waiting passenger at a floor or the arrival of an elevator car to board waiting passengers at a floor) and “car arrival” event (when a control decision is made). It is assumed that a passenger concludes their waiting period upon the arrival of the elevator car at the floor and the opening of its doors, as the precise boarding time remains indeterminate. The passenger arrival rate (λ) for each floor is estimated based on the inverse of the most recent inter-button-press interval (the time elapsed between the last floor service and the subsequent button press). To mitigate fluctuations arising from excessively short inter-button-press intervals, a maximum arrival rate (λ^) of 0.04 passengers per second is imposed. Let t0 denote the time of the preceding event, t1 denote the time of the current event, w0(b) represent the elapsed button press time for button ‘b’ at t0, and w1(b) represent the elapsed button press time for button ‘b’ at t1. The cost for each car ‘i’ is updated by accumulating the discounted cost between the previous and current event times ^[1].

(7)

Decision-Making and Q-Value Update

An elevator car traversing between floors generates a “car arrival” event upon reaching a point necessitating a decision regarding whether to halt at or proceed past the subsequent floor. In certain scenarios, the car’s action selection is constrained, such as mandatory stopping at the next floor for passenger disembarkation. An agent encounters a decision point exclusively when it possesses an unconstrained choice among available actions. The algorithm employed by each agent for decision-making and Q-value estimate updates is outlined below. At time tx, upon observing state x, car ‘i’ arrives at a decision point. The car selects an action ‘a’ utilizing a Boltzmann distribution over its Q-value estimates.

(8)

The parameter ‘T’ governs the degree of stochasticity in action selection. During the initial learning stages, when Q-value estimates are inherently imprecise, larger values of ‘T’ are employed, thereby assigning approximately equal probabilities to all available actions. Subsequently, as learning progresses and Q-value estimates attain greater accuracy, smaller values of ‘T’ are utilized. This approach favors actions deemed superior while concurrently facilitating exploration to gather further information regarding alternative actions. The selection of a sufficiently gradual annealing schedule is particularly critical in multi-agent settings ^[1].

Let the subsequent decision point for car ‘i’ occur at time ‘ty’ in state ‘y’. Following the update of all cars’ (including car ‘i’) R[·] values, as delineated above, car ‘i’ adjusts its estimate of Q(x, a) towards the following target value:

(9)

Representing Q-Values with Fuzzy Systems:

The Q-value evaluation formula for Q-learning, as implemented with a neural network, can be reformulated using fuzzy logic. Instead of relying on precise Q-value estimates from a neural network, fuzzy logic employs a set of fuzzy rules to determine action values. These rules are defined based on system inputs (states x and y) and desired outputs (action values).

In place of a neural network estimating Q(x, a, θ), a fuzzy inference system is utilized to determine action values based on fuzzy rules. This system involves:

1. Fuzzification of inputs: System inputs (states x and y) are converted into fuzzy sets.

2. Application of fuzzy rules: Fuzzy rules are applied to input fuzzy sets to determine output fuzzy sets.

3. Defuzzification of outputs: Output fuzzy sets are converted into numerical values.

With this approach, the Q-value evaluation formula is reformulated as:

(10)

Where:

Δ: Q-value update magnitude.

α: Learning rate.

R[i]: Reward received in the [t_x, t_y ] time interval.

β: Discount factor.

t_y: Time of reaching state y.

t_x: Time of reaching state x.

Q_f (y): Fuzzy inference system estimated value at state y.

Q_f(x, a): Fuzzy inference system estimated value at state x for action a.

Formula Parameters:

α (Learning rate): Determines the proportion of new reward used to update the Q-value.

β (Discount factor): Determines the influence of future rewards on the Q-value calculation.

R[i] (Reward): Reward received by the agent in the [tx, ty] time interval.

Qf (y) and Qf (x, a) (Fuzzy values): Values estimated by the fuzzy inference system ^{[12, 13, 15]}.

Annealing Scheduling in Fuzzy Reinforcement Learning

Annealing scheduling plays a pivotal role in fuzzy reinforcement learning systems by controlling the exploration-exploitation trade-off of agents. This process, which gradually reduces the stochasticity of decision-making, significantly impacts the algorithm’s final performance. A more gradual annealing rate facilitates the convergence towards optimal solutions.

In fuzzy systems, the parameter ‘T’, analogous to ‘temperature’, governs the uncertainty in action selection. During initial learning phases, when membership functions and fuzzy rules are not fully tuned, higher ‘T’ values are employed. This encourages agents to explore the state-action space extensively, preventing entrapment in local optima.

As learning progresses and membership functions and fuzzy rules are refined, ‘T’ values are gradually decreased. This transition guides agents towards exploiting acquired knowledge, mitigating excessive random action selection.

Precise ‘T’ parameter tuning involves various annealing schedules. A common approach utilizes an exponential decay function:

T = T_initial * (factor)^h

where:

T_initial: Initial temperature.

factor: Decay rate (0 < factor < 1).

h: Number of simulated hours.

Selecting appropriate T_initial and factor values depends on problem characteristics and desired exploration levels. Generally, higher T_initial and factor values closer to 1 promote more exploration and less exploitation.

In multi-agent systems, annealing schedule tuning is paramount. Agents must simultaneously explore and exploit to achieve proper convergence. Gradual annealing schedules enable inter-agent coordination and prevent unstable behaviors.

This approach parallels the phenomenon observed in zero-sum games, where self-play learning enhances agent performance in dynamic environments ^{[9, 15]}.

Conclusion:

This study investigates the application of multi-agent reinforcement learning (MARL) and fuzzy logic models in elevator traffic control, aiming to enhance system efficiency by minimizing passenger waiting times. The elevator system is modeled as a discrete-event system, where significant events, such as passenger arrivals, occur at discrete intervals. In this framework, each elevator is controlled by an independent agent. Agents learn optimal policies through interactions with the environment and by receiving rewards. Q-learning algorithms are employed for learning, while fuzzy logic models are integrated into the Q-value computation to manage uncertainties and enhance decision-making. The study explores two learning paradigms: “omniscient” (with complete information) and “on-line” (with limited information). The “on-line” approach is deemed more practical due to its reliance on real-time data available in operational systems. Annealing scheduling is utilized to balance exploration and exploitation during learning. Precise tuning of annealing parameters significantly influences algorithm performance. Results demonstrate that the integration of MARL and fuzzy logic provides an effective approach to elevator traffic control, outperforming traditional methods by effectively managing uncertainties and environmental complexities.

Interpretation of Figures:

Figure 1: This graph illustrates a substantial reduction in the percentage of passengers waiting over 60 seconds as training hours increase. This indicates an improvement in elevator control performance with extended training.
Figure 2: This graph demonstrates a significant decrease in the final average squared waiting time with increasing training hours, further confirming the enhancement of elevator control efficiency through prolonged learning.

These figures empirically validate the study’s conclusions, highlighting the algorithm’s capacity for learning and performance improvement over time.

Key Findings:

Fuzzy inference systems effectively estimate Q-values, replacing neural network dependencies.
The system adeptly manages uncertainties and ambiguities.
Performance is enhanced under varying traffic conditions.
Increased flexibility is observed compared to traditional methods.
The elevator system is accurately modeled as a discrete-event system.
MARL effectively coordinates elevator movements.
Annealing scheduling optimizes learning dynamics.
Both “omniscient” and “on-line” learning approaches are viable.
Passenger waiting times are effectively minimized.
Annealing rates significantly impact overall performance.

References

Sutton, R. S. & Barto, A. G. (2018) Reinforcement learning: An introduction. MIT Press.

Stone, P., Brooks, R., Brynjolfsson, E., Corbett, M., Denning, E., Kambhampati, S. & Woroch, M. (2016) Artificial intelligence and life in 2030. Stanford University, Stanford, CA.

Bostrom, N. (2014) Superintelligence: Paths, dangers, strategies. Oxford University Press.

Nilsson, N. J. (2010) The quest for artificial intelligence: A history of ideas and achievements. Cambridge University Press.

Busoniu, L., Babuska, R., De Schutter, B. & Ernst, D. (2010) Reinforcement learning for control: Performance, stability, and deep approximations. Springer Science & Business Media.

Lewis, F. L., Vrabie, D. & Syrmos, V. L. (2012) Optimal adaptive control: Hamiltonian and reinforcement learning approaches. World Scientific.

Shoham, Y., Powers, R. & Grenager, S. (2007) ‘If multi-agent learning is the answer, what is the question?’, Artificial Intelligence, 171(7), pp. 365-377.

Bradtke, S. J. & Duff, M. O. (1995) ‘Reinforcement learning methods for continuous-time Markov decision problems’, in Advances in Neural Information Processing Systems, pp. 393-400.

Bradford, J. H. & Retch, J. R. (1995) ‘Real-time Q-learning for a continuous state and action Markov decision task’, in Proceedings of the 1995 IEEE International Conference on Neural Networks, Vol. 2, pp. 1092-1097. IEEE.

Bertsekas, D. P. & Tsitsiklis, J. N. (1996) Neuro-dynamic programming. Athena Scientific.

Tesauro, G. (1995) ‘Temporal difference learning and TD-Gammon’, Communications of the ACM, 38(3), pp. 58-68.

Watkins, C. J. C. H. & Dayan, P. (1992) ‘Q-learning’, Machine Learning, 8(3-4), pp. 279-292.

Littman, M. L. (1994) ‘Markov games as a framework for multi-agent reinforcement learning’, in Machine Learning Proceedings 1994, pp. 157-163. Morgan Kaufmann.

McCarthy, J. (2007) What is artificial intelligence? Stanford University, Stanford, CA.

Zadeh, L. A. (1965) ‘Fuzzy sets’, Information and Control, 8(3), pp. 338-353.

Tsetlin, M. L. (1973) Automaton theory and modeling of biological systems. Academic Press.

Turing, A. M. (1950) ‘Computing machinery and intelligence’, Mind, 59(236), pp. 433-460.

Elevator World

Since 1953, Elevator World, Inc. has been the premier publisher for the global vertical transportation industry. It employs specialists in Mobile, Alabama, and has technical and news correspondents around the world.

June 2025 Issue

OTHER CATEGORIES

Respect for Masters: Ali Aktaş

By Cem Öztürk | May 19, 2025

Residential Elevators 1878-1893

By Lee Freeland | June 1, 2025

Something Old, Something New

By Lindsay Fletcher | June 1, 2025

42nd Heilbronn Elevator Days

By Sascha Göbel | June 1, 2025

Lift Road Show Brasília 2024

By Carmen Maldacena | April 5, 2025

Three DeColas

By Kaija Wilkinson | June 1, 2025

Temperature Phase Plan Map and Poincare Section for Escalators

By Dr. Ali Albadri | June 1, 2025

Digital Maintenance and Field Use of Devices in Compliance With 81-28 Standard

By Nurullah Pireci | May 19, 2025

Application of Fuzzy-Enhanced Reinforcement Learning in Artificial Intelligence for Elevator Group Control

Abstract

Conclusion:

Interpretation of Figures:

References

Elevator World

June 2025 Issue

OTHER CATEGORIES

Respect for Masters: Ali Aktaş

Residential Elevators 1878-1893

Something Old, Something New

42nd Heilbronn Elevator Days

Lift Road Show Brasília 2024

Three DeColas

Temperature Phase Plan Map and Poincare Section for Escalators

Digital Maintenance and Field Use of Devices in Compliance With 81-28 Standard

June 2025 Issue

About Us

Subscribe

Quick Links

Contact Us

About Us

Subscribe

Quick Links

Contact Us