Inside The Mind Of A Hacker
These days, computer misdeeds are most often perpetrated in a businesslike manner.
This article was first presented at the 2018 International Elevator & Escalator Symposium in Istanbul. For more information on December 3-4, 2019’s event in Las Vegas and to participate, visit www.elevatorsymposium.org.
“Inside the Mind of a Hacker” sounds a bit like the title of a psychological thriller. You can picture the main character as a lone wolf, up in the early hours of the morning, crafting the perfect hack to wreak havoc across the world. Reality, however, is very different. Criminal hackers work like any other business, as part of commercial — albeit illegal — networks, with clear business models. We have come a long way from the lone hacker stereotype.
The hacker business is booming. It seems there is not a day that goes by without reports of private data leakages, botnets, ransomware attacks and zero-day vulnerabilities being discovered. There are many factors at play, which together form the perfect storm:
- Increased connectivity between a multitude of devices, ranging from cars to pacemakers, is creating more opportunities for hackers to get in. In many cases, internet connectivity was simply added to these devices without much thought for the security risks — either to the device or ecosystem in which it operates.
- Along with this increased connectivity is the proliferation of devices outside a traditional information technology (IT) security infrastructure. These devices make up the edges of the Internet of Things (IoT) networks but are easy for hackers to gain access to. Once in the hands of the attacker, they have lots of time to apply their tools and techniques and find a scalable attack into the device.
- Networks such as the dark web, or even just the internet, have become a platform for hackers’ businesses, because they remove the complexity of mounting an end-to-end attack. Instead, cybercriminals can specialize in a part of the value chain, such as creating a botnet and selling it to the highest bidder.
- Standardization of common chipsets or platforms, such as Linux and Android, allows hackers to use their tools against many products, often against targets in vastly different markets. This makes their hacks scalable and transferable between countries, industries, devices and cloud platforms.
Given all these changes, it’s surprising that we’re still approaching software security the same way we did 20 years ago:
- There is still a heavy focus on network or perimeter security, aimed at preventing access to software from outsiders. But, this is directly at odds with the unstoppable push to have everything connected; at a certain point, there will be no such thing as an “outsider.”
- There is still a goal of making something “fully secure” from the outset with little thought to evolving that security. But this model misses one of the greatest advantages of software: it is easily updatable.
In today’s connected world, you need a different approach, one that focuses more on what criminal hackers are trying to accomplish, one that starts from the premise that our software runs in an accessible, and, therefore, hostile environment. You need to identify your most valuable assets, explore why they are of interest to someone else, figure out how they could be attacked by someone with privileged access, then apply defenses that will make those attacks unprofitable. You need to start thinking like a hacker.
The Business of Hacking
Once we recognize that criminal hackers are operating like a business, we see they are driven by the same goals as any other business — maximize revenue and minimize costs.
A hacker will go after targets that have the most value for them. Is there intellectual property (IP) or a proprietary algorithm that could be obtained? Personal data that can be sold on the dark web? A safety system that can be compromised? Is there a way to paralyze a system or damage a company or brand to extort a ransom? Or, more directly, are there money streams that can be manipulated? The more valuable the asset, the bigger the payoff. Also, the more common the weakness, the more revenue that can be made.
The quicker an attack can be set up, the lower its cost, so hackers tend to go for the path of least resistance. Imagine a burglar — he will not try to break the robust security of the front door or get through the ground floor windows that have sensors installed. He will spot the shed with the cheap padlock to access a ladder and then break into the floor above. As with a house, software will never be completely impenetrable, but anything that makes hacking too time-consuming or a one-shot attack — and, therefore, too costly — will encourage the hacker to move on to an easier target.
Asset and Threat Identification
Thinking like a hacker starts with asking yourself a basic question: “What do I have that a hacker wants?” These are the assets that must be protected. As above, this could include IP, digital content, personal data and more.
For each asset, you then ask, “What can a hacker do with this?” More specifically, “How will they leverage the asset to drive revenue?” These are the threats, and they are ever-present. For example, a criminal hacker may lift valuable IP from your application and use it to make a competing product without repeating your significant R&D investment. This threat — theft of IP — will exist for as long as your product is active.
One way of categorizing threats is the STRIDE model, developed by Microsoft. STRIDE stands for a way of classifying threats into six categories:
- Spoofing — impersonation of a person or process
- Tampering — modification of an asset
- Repudiation — denying an action took place
- Information Disclosure — revelation of a secret
- Denial of Service — affecting the availability of a system
- Elevation of Privilege — unauthorized access
Such a model is useful for encouraging a more holistic consideration of threats. What if, instead of stealing IP to sell a competing product (Information Disclosure), criminals offered a service to disable license checks in your software (Tampering), allowing it to be freely pirated? What if the criminal is looking to embarrass your company to profit from shorting your stock price? Or, what if the criminal is not actually interested in attacking your company, but rather sees your product as a large-scale attack vector against all of your customers? Each of these is a completely different business model for the hacker, but all are equally damaging for your business.
Now that you’ve identified what hackers want and what they want to do with it, you move on to the next part of thinking like a hacker: “How will they get to my assets?” These are the attacks against your software. It’s very important to be thorough when identifying attacks, because this is a hacker’s main advantage: they only need to find one attack, whereas you have to defend against all attacks.
This article focuses on software attacks, because this avenue is so often overlooked, but attack identification must also consider hardware attacks, protocol attacks, database attacks, network attacks, cloud attacks and more.
It’s very important to understand that almost all attack development goes through predictable phases:
- Reverse Engineering — analysis of code and data, looking for assets and vulnerabilities (paths to an attack)
- Modification — static or dynamic tampering of code or data to realize a threat
- Automation — allowing the exploit to be done automatically and repeatedly
For certain attacks (e.g., IP theft), step 2 might not be necessary. But, step 1 is critical, and step 3 is what allows criminals to turn an isolated attack into a scalable business.
Why is this so important? Because this is your main advantage. Criminal hackers must execute all phases of an attack to realize their business objectives. You can put up defenses at every phase and thereby increase the hacker’s costs. With enough barriers, the attack is no longer cost-effective, and the hacker looks elsewhere. We will return to this idea of “defense in depth,” employing multiple techniques to make every phase of the attack more difficult.
Following, we describe several attack techniques for each phase. A successful attack generally combines one or more techniques from each category.
1. Reverse Engineering
|Runtime memory inspection||The act of copying the contents stored in volatile random-access memory onto a hard disk so it can be analyzed later. Attackers can also analyze the memory contents after power off. Targets include secret data, such as unprotected cryptographic keys.|
|Disassembly||The act of translating from binary code into assembly language, making it more readable to attackers|
|Differential attack||The act of comparing two variations of the same software and/or data. By detecting binary differences, attackers can identify and target code to which security enhancements have been applied.|
|Collusion||An attack tactic whereby two or more attackers work together in an agreed-upon fashion to improve the chance of a successful attack.|
|Reverse control flow||The act of tracing a program’s execution on an instruction level to identify locations of function calls, loops and conditional branches|
|Interactive debugging||The act of using an interactive debugger to execute targeted software in a controlled manner to breach the software’s security|
|Process snooping||All of the reverse engineering that can be performed without interrupting an existing process. During startup, possible targets include “call home” functionality, unexpected licenses and user data checking.|
|Data lifting||The act of extracting data from a static section of an application and linking it or loading it in a different application|
|Code lifting||The act of extracting code from a static section of an application, either by explicitly pointing to a section of memory where specific code resides (in-place code lifting), or by decompiling one section of binary and recompiling it into another binary (out-of- place code lifting)|
|Modifying control flow||The act of changing the original behavior of a computer program. This involves altering the computer instructions by composing alternate instructions to gain access to functionality not originally intended.|
|Data file replacement||The act of replacing original data files that correspond to limited access or execution privileges with new data, allowing an attacker to gain privileges that were not originally accessible.|
|Program file replacement||The act of replacing a dynamically loadable executable, which was intended to be used by the author, with a file that may have malicious side effects. Potential attack goals include extracting premium protected media content or bypassing a license check.|
|Instruction replacement||The act of adding, modifying or removing binary instructions|
|Branch jamming||The act of changing the Boolean result of a condition so that the branch target of the condition taken at runtime is reversed|
|Automatic exploits||The act of automatically modifying the application to cause the changed behavior|
|Redeployed data files||The act of redistributing an entire system to decrypt content once an attacker understands what is required to decrypt the content|
|Dynamic library exploits||The act of taking advantage of vulnerabilities in a dynamic library|
|Unauthorized invocation||The act of launching or executing software by an unlicensed party|
The attack analysis process involves working backward from an identified threat against an asset, which can be seen as an attacker’s end goal, and determining all the attacks that could take the attacker to that end goal. This is repeated for every threat against every asset to build out a full view of where attacks might occur and, thus, where defenses need to be applied. This will give the owner of the targeted system a set of security requirements to build into its products’ architecture and design.
Now that you’ve thought like a hacker and identified which attacks can be launched and how they will proceed, you are ready to systematically deploy defenses to frustrate the hacker’s efforts. There’s no perfect solution that will shut out an attacker for all time; instead, there’s a multilayered and dynamic approach that raises costs and lowers revenue for the hacker. By hitting criminal hackers where it hurts — their business model — you make yourself a far less attractive target.
An analogy we sometimes use is this: if you are trying to reach a destination, are you going to choose the straight path with clear visibility or the treacherous path in the fog? Most people will choose the easy path, and so do most hackers. By applying the right layering of defenses in the right way, your software becomes the treacherous path for hackers trying to reach a destination of revenue, and they will go elsewhere.
The SPIDER Model
A good defense in depth approach should act like a web — the different parts should be mutually strengthening with no single point of failure. We call our particular approach the SPIDER model to reinforce this image of a web of protection. SPIDER stands for Software Protection: Integrity, Diversity, Entanglement and Renewability. Let’s look at each of these properties in turn:
- Integrity: Integrity verification will ensure your software hasn’t been tampered with. Like silk, it gives the web of software protection its strength. It is useful at loading time but is enhanced when it is dynamic: checking software integrity throughout the execution of the software. A reliable and robust/tamper-resistant integrity verification capability is an important element to establishing a software root of trust, especially when hardware anchors are not available. Philosophically, integrity verification can happen throughout each component of software protection; for example, see “Entanglement,” below.
- Diversity: It has been understood for quite some time that a security solution needs to be renewable and diverse to support a proper security lifecycle. In software protection, diverse instances of software can frustrate a hacker’s efforts to understand what is going on, especially when a simple change to a random seed can create diversity in the algorithmic code and data cloaking such that the instances have very good separation between each other. In the world of the spider, the web will vary over time, as well, typically due to an attack, capture of a bug or other external events. Also, spider webs are very diverse but are built in a repeatable algorithmic way — just like good software protection.
- Entanglement: Entangling code and data as part of software hardening is an effective technique that can help reinforce the software’s protection. Entanglement can be applied algorithmically at the source level such that nothing can be modified without affecting the control flow of the program. This is another good example of how software protection is like a spider web — the web is very sensitive to disturbances of bugs landing on the silk.
- Renewability: Since effective software protection has a measurable impact on hacker productivity, it is feasible to
- anticipate the “time to hack” and use renewal of the software protection to deliberately frustrate the hacker’s progress midstream. When a breach in security is detected, new, diverse instances can be created in combination with the application of a different set of protections for an effective renewal cycle. This latter point is very similar to a spider web in that, once breached, it remains mostly intact and is easily repaired.
Software Protection: It’s More Than Just Software
We believe a complete software protection strategy must have a significant component that is software-based. There is an obvious reason for this: if the software itself has been modified to make it harder to attack, there is no layer that can be peeled away to get at the original software. The other major reason to have software-based defenses is renewability. As argued above, renewability is an essential of software protection, and a pure hardware solution is just too slow and too costly to renew regularly. This is why the defenses enumerated here are software-based.
That said, when the goal is preserving your business, you can and should use every tool at your disposal. If your platform has a trusted execution environment, using it will significantly raise the
bar against reverse engineering. If you have access to a true random-number generator, this closes the door against attacks that tamper with random data. Cryptographic co-processors can improve both the security and performance of your critical
cryptographic operations. A hardware-anchored secure boot can be leveraged to provide ongoing integrity verification.
The best software protection comes from a combination of software- and hardware-based defenses working in tandem. Again, defense in depth is the name of the game.
Components of the SPIDER model
Our SPIDER web has three anchors:
- Code Transformation and Obfuscation is performed by the Transcoder, a source-to-source tool that makes software harder to reverse engineer and tamper with without altering functionality.
- Whitebox Cryptography provides white-box attack-resistant implementations of standard cryptographic algorithms, providing specialized protection to one of your most critical assets, cryptographic keys and data.
- Integrity Verification and related technologies use application programming interfaces (APIs) to create links between program functionality and ongoing security checks, ensuring your software resists both static and dynamic attacks.
Importantly, all these pieces work together, both to protect each other and provide defense in depth. Moreover, each technology is highly data-driven, allowing for considerable diversity and renewability, controlled simply with two random seeds.
Data flow refers to the ordinary movements and computations involving data in a program. It includes arithmetic operations, Boolean operations, assignments and more. The objective of data flow transformations is to keep the data in a protected state throughout the data flow by hiding basic operations behind complex mathematical transformations and a high degree of additional uncertainty or entropy. Variables, constants and operations are all diffused into the program flow, making the original computations and data extremely hard to determine.
Even simple encoding provides a degree of protection. For example, consider an original variable x transformed to x’ = sx + d, and an original variable y transformed to y’ = ty + d. To transform the computation z = x + y, we perform a transformed addition of x’ to y’, giving z’, as z’ = vx’ + wy’ + b, where v, w and b are constants computed based on the underlying data transformations. Even in this simple example, the combined transform space for the computation is over 100 bits. Data flow transformations available in practice go well beyond the above simple linear transformation and may combine multiple mathematical domains.
Control Flow Transformations
Control flow is recommended against disassembly, reverse control flow, interactive debugging, modifying control flow and branch jamming. It refers to the execution path followed as programs run and transfer control to various blocks of statements. Control flow transformations aim to make it extremely difficult to recover the original control flow of the program, which vastly increases the cost to the attacker attempting to reverse engineer the application.
The most fundamental control flow transformation employed is “control flow flattening.” Control flow flattening changes all control flow in a function (“if ” statements, loops, jumps, etc.) into a single switch statement, which allows the value of a variable or expression to change the control flow of program execution via search and map. This alone significantly reduces the program flow information available to an attacker.
Additional transformations build on top of control flow flattening, adding dummy branches (paths based on values the switch variable will never take) and history-dependent coding, which makes the switch variable dependent on the history of the application control flow and requires correct navigation through conditionals for proper execution of the application. This capability inhibits both analysis and tampering attacks, because any attempt to bypass a branch will have an unpredictable impact on the rest of the control flow.
Branch protection is recommended against modifying control flow and branch jamming and a targeted anti-tamper defense for conditional statements (ifs, loops, etc.). Attackers typically try to jam or bypass important branches in the code to sidestep security checking, or to modify the original flow of the program. Branch protection prevents branch jamming by adding code that causes the program to behave incorrectly if the branch is jammed. By analyzing the condition in the branch, certain properties are derived that hold if the condition is true but do not hold if it is false. Based on these properties, branch protection creates mathematical dependencies between the conditions and existing code. This ensures the program will be in an incorrect state if an attacker jumps to a specific branch.
String transformations are recommended against data lifting and automated exploits. This special type of data transformation works on string literals. Their goal is to render all literal strings in the application meaningless. Special handling is needed, because there is no “string flow” analogous to data flow in an application. Thus, extra code is generated to properly decode literals when needed, so they can be properly rendered in error messages and the like.
API Protection is recommended against code lifting and dynamic library exploits.
Function Signature Transformations
Function calls represent a clear boundary that can be exploited by an attacker to get considerable insight into your program. In particular, looking at the parameters passed to a function can provide information about that function’s purpose. Function signature transformations disguise function parameters by inserting them into a type-masking array intermixed with dummy parameters. The result is that all function calls look similarly ambiguous.
Function merging takes advantage of the uniform signatures described above to create new functions by merging the bodies of two or more functions together. This creates a false dependency between disparate parts of the program, thereby frustrating an attacker’s attempt to understand functionality.
Function indirection exploits the simple fact that function pointers are harder to trace than function calls. It works by creating function pointers and replacing standard calls with indirect calls through pointers.
The secure inlining function allows you to merge separate logical sections of code within a file before transforms are applied. It is like standard function inlining, but its purpose is to remove function boundaries and combine operations to obscure program logic. Secure inlining may increase code size, yet it often improves performance.
Secure inlining is most powerful when used in conjunction with function signature transformations, function merging and function indirection. The overall result is a significant manipulation to a program’s function boundaries.
Almost every application related to security will use cryptography, whether for authentication, confidentiality, integrity, non-repudiation, or a combination thereof. The strength of a cryptographic implementation is directly related to the secrecy of critical security parameters, most notably cryptographic keys. These keys can be protected in storage and in transit, but in a software environment, under the control of an attacker, the keys are especially vulnerable in use. In this so-called “white-box attack context,” attackers can observe execution and lift keys, completely negating the security of the system.
Our “whitebox cryptography” libraries are designed specifically for the whitebox attack context, keeping keys protected even when in use. It is recommended against data lifting and unauthorized invocation. With a full set of cryptographic primitives, including AES encryption, RSA encryption and signing, ECC encryption and signing, SHA2 hash, HMAC message authentication code, and cryptographic strength PRNG, these implementations can be used as a secure alternative to standard libraries like OpenSSL.
Since producing the first practical whitebox-attack-resistant AES implementation in 2002, Irdeto has continued to advance its technology to stay ahead of the latest threats. Today, its whitebox implementations are protecting keys in well over a billion devices.
File Encryption and Secure Storage
As well as protecting data in transit, whitebox cryptography is also useful for protection of local data at rest. This can be done with file encryption, providing static protection of other files that form part of the application. It, with Irdeto’s Secure Storage library, is recommended against data file replacement and unauthorized invocation. The library provides a straightforward
interface for the persistent storage of arbitrary data. In both cases, whitebox cryptography ensures the keys will never be exposed.
Integrity verification is recommended against code lifting, data file replacement, program file replacement, instruction replacement, redeployed data files and dynamic library exploits. It is a secure method of validating the integrity of an application, and it can also ensure the integrity of external modules interacting with that application. Integrity verification ensures that software cannot be tampered with, either statically or dynamically, without detection. This significantly raises the bar in tamper resistance, because an attacker must not only reverse engineer a program and make modifications to the binary; he must also defeat the integrity checking, as well. Irdeto offers two variations of integrity verification.
The company’s Buildtime IV component is a more secure variation of code signing that ensures trust on an untrusted host. The customer signs modules at build time, storing an encrypted hash of the target module with the final application. At runtime, we compare a runtime hash of the target module with the
encrypted hash from build time. Also, because the Buildtime IV library is statically linked into the application and signed, it continually monitors its own integrity.
Irdeto’s Runtime IV component is appropriate for environments where the application binary is not finalized at build time, such as iOS bit code. Runtime IV uses defense in depth to create a window of trust where application signatures can be computed at runtime; thereafter, it is functionally equivalent to Buildtime IV.
Both Buildtime IV and Runtime IV are integrated into the application using callbacks. Each IV call takes a function pointer called the success callback. If the check passes, the success callback is invoked, and execution continues normally. If the check fails, the callback is not invoked, meaning tampered programs do not follow the correct program flow.
Debuggers are an invaluable tool in the hacker’s arsenal, allowing them to execute a program step by step and to watch data as it flows through the application. As such, anti-debug technologies are an excellent way to frustrate reverse engineering attacks. Recommended against interactive debugging, they have three variants:
- Timing-based anti-debug (TBAD) works by comparing the actual time taken to execute a series of instructions with a predetermined expected time. Because stepping through instructions using a debugger is orders of magnitude slower than executing them at full speed, these timing checks will fail in a debugger environment. Negative timing checks (ones that are expected to fail) can also be used to prevent the attacker tampering with the system clock.
- Signal-based anti-debug is available in user mode for Android and embedded Linux environments. It works by intercepting all signals from the application and invoking special handlers to process those signals. An attached debugger will process the signals differently and will thus modify application behavior.
- Ptrace-based anti-debug is available for iOS systems only. As soon as the application starts up, the ptrace system function is called to attach a monitoring process. This prevents any other process, including a debugger, from attaching to the application, so the debugging session cannot even start.
Many of the modification attacks listed above are facilitated using “hooking.” Generically, hooking is an attack technique that instruments and modifies program flow by modifying APIs. We offer specialized techniques to frustrate hooking. They are recommended against program file replacement, instruction replacement, automatic exploits and dynamic library exploits.
Jailbreak and Root Detection
Attackers wishing to gain full control over mobile devices as a precursor to application analysis and tampering will take advantage of one of several public tools for jailbreaking (iOS) or rooting (Android) the device. (For simplicity, we will use “rooting” as a blanket term in the following.) With each new operating system release, some tools are defeated, others are updated, and new tools appear.
Irdeto thus employs an ever-evolving suite of techniques for detecting that a device has been rooted. This includes looking for binaries that are part of popular rootkits, checking the behavior of certain system functions and more. As with anti-debug and integrity verification, callbacks are used to make sure these checks are done as part of correct program flow.
On mobile devices, hooking is accomplished using a hooking framework such as Cydia Substrate. Irdeto’s hooking detection technology looks for the presence of these frameworks, with callbacks used to determine the appropriate application response.
Recommended against redeployed data files and unauthorized invocation, fingerprinting is the process of collecting attributes from a given device to uniquely identify that device. The attribute values are coalesced into a value called the fingerprint; the intention is that any other device would have a different fingerprint. Irdeto provides a library that allows users to choose which system and application attributes they want to gather. It combines that data to produce application-specific fingerprints. Moreover, using a variation on secret sharing to protect the fingerprint computation, Irdeto can support advanced m of n schemes, where some of the queried attributes can change without affecting the computation of the fingerprint.
Fingerprinting on its own is an identity feature that can be very useful in frustrating automation attacks. When combined with other program data, it can be used for “node locking,” ensuring that data is usable only on the given device and cannot be shared with others.
“Cybercrime is a hot business in which hackers have the advantage. To com-bat the rising trend, all companies participating in the ecosystem must be on top of their game.”
Diversity and Renewability
Imagine your defenses as a maze you are making the attacker navigate to reach the prize (a successful attack) at the center. A maze can be fiendishly complex (defense in depth), but, with enough time and effort, the attacker will be successful — that is, unless the maze keeps changing. This is renewability — the option to change the specific defenses with each software update, forcing the attacker to start their efforts from scratch. What’s more, all the time spent navigating one maze does not make navigating a second maze easier. This is diversity — the option to have multiple variants of your software, vastly increasing the effort required to launch a widespread attack. Along with renewability, it is recommended against differential attack, collusion and automatic exploits.
With Irdeto’s technology, creating diverse protected instances of your program is easy: simply decide how many copies of the software you want, and the tools do the rest. Renewability between updates can be achieved by providing different seeds to the internal PRNG; the result will be different data transformations, control-flow transformations, function transformations and key protections. Where necessary, transformation information can also be ported from one version to the next to facilitate backward compatibility.
Overlapping and Interacting Protections
While each of the techniques above is effective on its own, the full threat mitigation comes when they are used together. Some examples:
- All of Irdeto’s library code is protected using the Transcoder, making it harder to reverse engineer the security technique itself.
- Buildtime and Runtime IV make use of whitebox cryptography to protect the verification operations.
- Hook detection increases the difficulty of working around anti- debug.
- Anti-debug makes it harder to manipulate callbacks to defeat integrity verification.
- Integrity verification thwarts attempts to modify the binary and strip out root detection checks.
Cybercrime is a hot business in which hackers have the advantage. To combat the rising trend, all companies participating in the ecosystem must be on top of their game. If you develop connected devices or software, you need to choose wisely where to spend your time and budget. “Thinking like a hacker” will help shift your focus from trying to defend the perimeter (since it doesn’t exist) and from trying to make software impenetrable (not possible), to a strategy that targets cybercriminals where it hurts them most: by breaking their business models.
Multilayered software protection makes hacking your applications too time-consuming and expensive and can therefore be an incredibly valuable part of your cybersecurity arsenal. The Cloakware suite of tools, as characterized by the SPIDER model, gives you a powerful, robust and renewable way of making your software unattractive to hackers.