What Is Reverse Engineering?
October 19, 2023
Reverse engineering is restoring an object's structure, whether a program or a part. This is done when the source code of a program is lost or inaccessible. Software reverse-engineering - restoring the principles/algorithms of a program to understand how the program works and possibly re-create this mechanism. There could be the following purposes:
- Analyze viruses/trojans/worms to establish the counteraction.
- Search for closed software breaches to create viruses/worms/exploits.
- Create descriptions for data/protocol formats used in programs.
- Analyze closed drivers and make open drivers for Linux.
- Recover lost information.
- Detecting side effects.
However, software hacking should not be confused with reverse engineering: hacking can be defined as actions such as bringing back debugger functions or those modules which were disabled by default due to vulnerabilities or other problems, as well as taking apart how the license check works, and disabling it. Reversing, in turn, is a complete analysis of the program brick by brick and analyzing its behavior.
How does Reverse Engineering Work?
Globally, reverse engineering reversal can be divided into software and hardware. However, we are not interested in the latter since this article only touches on software reverse engineering. For example, take a program written in C++ or Java language that other programmers can understand. But to run it on a computer, it must be translated by another program, called a compiler, into a binary code. Compiled code is incomprehensible to most programmers, but there are ways to convert machine code into a more human-friendly format, including a decompiler software tool. Reverse engineering consists of several steps:
- Information Gathering. This step gathers all possible information (i.e., initial design documentation, etc.) about the software.
- Information study. The information gathered in the previous step is studied to familiarize me with the system.
- Structure extraction. This step involves identifying the program's structure in the form of a structure diagram, where each node corresponds to a specific procedure.
- Writing functionality. This step handles the details of each structure module. The diagrams are written using structured language such as a decision table, etc.
- Data flow recording. From the information extracted in steps 3 and 4, data flow diagrams are derived to show the data flow between processes.
- Write control flow. The high-level control structure of the software is recorded.
- Extracted project preview. The extracted design document is checked several times to ensure consistency and correctness.
- Documentation creation. Finally, in this step, all documentation, including SRS, design documentation, history, review, etc., is recorded for future use.
Reverse Engineering Tools
As mentioned above, reverse engineering is often used to investigate viruses and find practical solutions to counteract them. To reverse engineer malicious code, engineers use many tools. Here are some of the most important ones:
- Disassemblers (e.g., IDA Pro). A disassembler parses the application to create assembly code. Decompilers can convert binary code into native code, although they are not available for all architectures.
- Debuggers. Reversers use debuggers to control the execution of a program to get an idea of what actions it performs at runtime. They also allow the engineer to control certain aspects of the program while running, such as program memory areas. This allows a better understanding of the program's actions and their impact on the system or network.
- PE viewers (CFF Explorer, PE Explorer). For example, PE viewers (for the Windows Portable Executable File format) extract essential information from executable files to provide a dependency view.
- Network analyzers. Network analyzers show the engineer how a program communicates with other machines, including what connections the program makes and what data it tries to send.
Applications for reverse engineering:
- 1️⃣ IDA is an interactive disassembler that has a built-in command language (IDC) and supports several executable formats for various processors and operating systems.
- 2️⃣ CFF Explorer is a suite of tools for portable executable (PE) editing.
- 3️⃣ Detect It Easy is a cross-platform program designed to determine the types of files by analyzing their binary signatures. It offers an open architecture for adding custom file type detection algorithms, making it a versatile tool for identifying various file formats.
- 4️⃣ ImHex is a hex editor that provides a rich set of features and development tools for Windows.
- 5️⃣ Scylla can be used for dumping a running application process and restoring the PE import table. With its help, you can get a restored PE file that can be run by the operating system.
- 6️⃣ Relocation Section Editor is an application used for editing the relocation table in PE files. The main purpose of this tool is to modify the relocation table in case of patching relocatable pieces of code. But it’s often used to remove the relocation table when restoring a protected file.
- 7️⃣ dnSpy is tool for .NET binaries: Useful for .NET analyzing code (decompiler, rebuilder, editor).
Challenges of malware reverse engineering
As malware becomes more complex, the chances that the disassembler somehow fails or the decompiler creates confusing code increase. Thus, reversers need more time to understand the disassembled or decompiled code. During this time, malware can wreak havoc on the network. Because of this, more attention is paid to dynamic malware analysis. Dynamic malware analysis involves using a closed system (sandbox) to run malware in a safe environment and watch what it does.
However, besides the advantages, using a sandbox for dynamic analysis also has problems. For example, many sophisticated malware use circumvention techniques and can detect that they are in a sandbox. When the sandbox is detected, the malware will not show its true malicious nature. Advanced malware has a set of tools they can use to fool sandboxes and avoid detection: they can suspend their malicious actions, acting only when the user is active, thus hiding the malicious code where it will not be detected, along with a host of other evasion techniques.
This means that reverse engineers cannot rely solely on dynamic techniques. But, at the same time, reverse-engineering every new malware threat is unrealistic.
Packing and Obfuscation
Problems of a different shade come from malware developers, who try their best to make the sample harder to detect and analyse. There are two methods that are used particularly often – obfuscation and packing. Both existed for a long time, but have become kind of an industry standard only recently.
Obfuscation is what it sounds like – making the code tangled so it is much harder to understand what is happening. Most modern practices include using specific obfuscation techniques that confuse not only humans, but also machines. Default reverse engineering utilities will output completely irrelevant information, and runtime analysis modules of antivirus engines will fail as well. Sure enough, all these tricks may be beaten by more sophisticated detection/analysis software, but that actually creates yet another edge of the “malware vs anti-malware” race.
Packing is somewhat similar to obfuscation by the end result, but the methods are completely different. The procedure of packing supposes compressing and/or encrypting the binary file, so any attempts to analyse it will show an unreadable array of chars and numbers. Thing is, hackers do not disdain using legit packing utilities developed to protect software from cracking – so it may be particularly hard to distinguish a legit app from malware.
Optimizing of reverse engineering
By using dynamic analysis to automate malware analysis as much as possible. Cybersecurity professionals can tackle advanced malware more quickly and efficiently, freeing up their time for challenging work, such as learning and parsing new encryption schemes, reverse communication protocols, or working on attribution. The more advanced the automated solution, the more likely the reverser won't have to return to the process's initial (and time-consuming) phase. The latter involves unpacking, deobfuscating, and understanding the malware's fundamental behavior.
The best option would be for cybersecurity teams to implement a two-pronged approach in which sandbox technologies are used to analyze the vast majority of threats automatically. In contrast, reverters devote their time to "surgically" analyzing the insides of the most complex ones when additional threat information is needed.
Conclusion
Reverse engineering has many legitimate applications in IT. It can be either a legal or ethical approach to solving compatibility problems, recreating obsolete components, assessing security, improving an existing product, or making it cheaper. The steps involved can be complex and vary depending on what is being re-engineered. For example, QA professionals who want to solve user problems with software products may reengineer complaints to find the cause. Determining the root causes of user problems is difficult, but reverse engineering techniques eliminate some guesswork.