Introduction to Reverse Engineering!

in #reverse6 years ago (edited)

You were sure you already wanted to be a hacker/cracker, but after you watched Mr Robot, you got super excited to revolutionize the world with lines of code or even invade your school system and change your grades. To start our adventure, we will introduce you to reverse engineering, but first, what is it?

Reverse engineering is the process of taking a compiled binary and trying to recreate (or simply understand) the original form of the program. A programmer initially writes a program in a high-level language, such as C ++, C #, Java, or Visual Basic (Delphi, Pascal, Assembly). Because the computer does not speak these languages, the code the programmer wrote is set up in a way the machine can translate, to which a computer speaks. This code is called a binary, or the language of the machine. It is not very friendly, and often requires a great deal of brainpower to find out exactly what the programmer had in mind.

WHY IS REVERSE ENGINEERING USED?

Reverse engineering can be applied in many areas of computer science, but here are some generic categories:

• Make it possible to interact with the legacy code;
• Break the copy protection;
• Analysis of malware;
• Evaluation of the quality and robustness of the system;
• Add functionality to your existing system.

The first category refers to reverse engineering code to interact with existing binaries when the source code is not accessible. We will not discuss this, because it's annoying.

The second you probably already know, which is the breakdown of the software validation system, basically to remove or change the form of authentication to get the software for free. We will go deep into this area.

The third is malware studies; reverse engineering is necessary here because many exploit/malware programmers do not describe how they did it, and how to do it (unless they're really dumb).

This is a very exciting field, but it requires a great deal of knowledge. We will not discuss this until later. The fourth category is evaluating software security and vulnerabilities.

Think of Windows operating systems - reverse engineering is used to ensure that the system does not contain any vulnerabilities, security holes and, frankly, to make it as difficult as possible to allow crackers to break your security.

The final category is to add functionality to existing software. Personally, I think this is one of the most fun. Do not like the graphics used in your web design software? Change them. Would you like to add a menu item to encrypt your documents in your favorite word processor? Add. Want to annoy your coworkers? Edit their ERP!

WHAT KNOWLEDGE IS NEEDED?

As you can probably guess, a lot of knowledge is required to be an effective reverse engineer. Luckily, it does not take a lot of knowledge to start reverse engineering, and that's where I hope to get in. That said, to have fun with reversing
and getting something from these articles you should at least have a basic understanding of how the program flow works (for example, you should know what an ‘if’ statement is, which is a matrix and at least seen a Hello World program).

Second, understanding the assembly and Python syntax is highly suggested. At some point you want to become an ASM guru to really know what you are doing, so it is always good to replicate the explained to understand the step-by-step!

Much of your time will be dedicated to learning how to use tools. These tools are invaluable to a Reverse Engineer, but also require learning shortcuts, failures, and idiosyncrasies of each tool.

Finally, reverse engineering requires a significant amount of experimentation; trying on different compilers/protectors/encryption schemes, learning about programs originally written in programming languages of different languages (even Delphi), deciphering anti-reverse engineering tricks, the list goes on and on. At the end of this article, I added an "additional reading" section with some suggested sources. If you really want to be good at reversing, I suggest you do some additional readings.

WHAT TYPES OF TOOLS ARE USED?

There are many different types of tools used in the rollback. Many are specific to the types of protection that must be overcome to reverse a binary. There are also several that only make life easier for the inverter. And then some are what I consider the "basic" items - the ones that you use regularly. For the most part, the tools fit into some categories.

DISASSEMBLERS
Disassemblers try to take the machine's language codes into binary and display them more in a more friendly manner, they also extrapolate data such as function calls, past variables, and text strings. To start, let's use the IDA (can be installed for free).

DEBUGGERS
Debuggers are the bread and butter for reverse engineers. They first analyze the binary as well as 'disassembler debuggers', then allow the investigator to pass the code, running one line at a time and investigating the results. This is invaluable in figuring out how a program works. Some debuggers allow some instructions in the code to be changed, and then run again with these changes in place. Examples of debuggers are Windbg2 and OllyDbg3

-Download IDA: https://www.hex-rays.com/
-Download WINDBG: https://developer.microsoft.com/en-us/windows/hardware/download-windbg
-Download OllyDBG: http://www.ollydbg.de/

HEX EDITORS
The Hex Editors let you view the actual bytes in a binary and change them. They also provide search for specific bytes, save sections of a binary disk, and more. There are many free hex editors. We will not use them much in these articles, but sometimes they are invaluable.

PE AND RESOURCE EDITORS
Each binary designed to run on a Windows machine (and Linux for that matter) has a very specific section of data at the beginning that tells the operating system how to configure and initialize the program. It tells the operating system how much memory it will require, which is compatible with the DLL and with which the program needs to borrow some function, information about dialog boxes and such. This is called a portable executable and all programs designed to run on Windows need to have one.

In the reverse engineering world, this byte structure becomes very important because it gives the inverter information about the binary. Eventually, you want (or need to) change that information. There are a multitude of PE viewers and editors, such as CFF Explorer4 or LordPE5, but you can feel free to use what you feel comfortable with.

SYSTEM AND MONITORS ANALYZERS
When reverting programs, it is sometimes important (like studying malware) to see what changes an application to the system; Are there registry keys created? Are there .ini files? Are processes created separately? Maybe to thwart the reverse engineer of the application? Examples of system monitoring tools are procom, regshot and process RSS feed. Let's discuss this later.

TOOLS AND INFORMATION
There are tools we'll pick up along the way, like scripting, scrabbling, packet identifiers, etc. Also in this category is a reference type for the Windows API. This API is huge and sometimes complicated. It is extremely useful in reverse engineering to know exactly what the functions are doing.

http://www.ntcore.com/exsuite.php
http://www.woodmann.com/collaborative/tools/index.php/LordPEN

NOW THAT YOU KNOW ABOUT THE BASICS, LET'S HANDS-ON!

Even if we started with little experience, I wanted to give you at least a bit of reverse engineering already in this first article. To do this, I included a viewer/resource editor called XN Resource Editor6 (it’s freeware). Basically, this program allows you to view the resources section in an ".exe" file as well as modify these features. You can have so much fun with it. Come on: first, run XN - click the load icon and go to Windows\System32\ and load “calc.exe” (the default Windows location may vary). You should see a lot of folders available:

Now click on the "Scientific" menu option. The Caption field should change to "&Scientific". It is to tell you what is the "Hot-Key", in this case 'S'. If, instead, we wanted the 'e' to be the Hot-key, it would be like 'Sci & entific'. So, do not like the built-in hot-key for calc? Just change! But let's do something different: In the Legend field, replace the "&Scientific" with "&Nerd".

This will now change the menu option to "Nerd" and will use the hot-key 'N' (I looked through the other options to ensure that there is no hot-key connection with the letter N). You should do this for all menu entries. Now go to File (in
the XN resource) and choose "Save As." Save your new version of calc to a different name (and preferably a different location) and then run it.

It worked! Of course, you do not have to stop there, I even changed the order of the numbers on the calculator.

https://stefansundin.github.io/xn_resource_editor/

THE OLLY DEBUGGER, OR OLLYDBG, OR SIMPLY OLLY FOR THE MOST INTIMATES
OllyDbg is a 32-bit assembly-level parser for Microsoft® Windows®. The emphasis on binary code analysis makes it particularly useful in cases where the source is unavailable.
Olly is also a dynamic debugger, which means that it allows the user to change some things as the program is running. This is very important when trying out a binary, trying to figure out how it works.

Olly has many, many excellent features, and that's why it's probably the # 1 debugger used for reverse engineering (at least in ring3, but we'll get to that later).

Overview
Having a viewer is essential to reverse engineer, but they are not at all friendly, so you understand how it works, we will make a mega overview here to fully understand this incredible tool that is Olly.

Olly opens with the default window, if yours is different, close the window and click on the 'C' icon, remember to open some ".exe" to make the information appear. It is divided into four main fields: Disassembly, Register, Stack and Dump. The following is a description of each section.

Disassembly
This window contains the main disassembly of the code for the binary. This is where Olly displays Binary Information, including the opcodes and the translated assembly language. The first column is the address (in memory) of the instructions. The second column is what is called opcodes - in assembly language, each instruction has at least one code associated with it (many have multiples). This is the code that the CPU really wants and the only code it can read. These optional codes make up the "machine language", the language of the computer. If you saw the raw data in a binary (using a hexadecimal editor), you would see a series of these opcodes, and nothing else. One of Olly's main jobs is to "disassemble" this machine into the language in assembly language more readable to humans. The third column is this assembly language.

For someone who does not know the assembly, it does not look much better than the opcodes, but as you are always studying, you end up understanding more about assembly and wind up understanding more about what the code is doing.

The last column is Olly's comment about this line of code. Sometimes this contains the API call names (if Olly was able to figure them out), such as CreateWindow and GetDlgItemX. Olly also tries to help us understand the code by naming any calls that are not part of the API with useful names, in the case of this image, "I / O Command" and "Superfluous prefix". Granted, this is not useful, but Olly also allows us to turn them into more meaningful names. You can also put your own comments in this column; just double-click the row in this column and a box will appear, allowing you to enter your comment. These comments will be saved for next time automatically.

REGISTER
Each CPU has a collection of records in it. These are named as temporary (or temps) as well as a variable in any high-level programming language. Here is a more detailed (and labeled) view of the records.

At the top is the current CPU Log. The records will change color if they have been changed from black to red (makes it very easy to watch changes). You can also double-click on any of the records to change its contents. These records are used for many things, and we will have a lot to say about them later.

The middle section are flags, used by the CPU to mark the code that something happened (two numbers are equal, one number is larger than another, etc.). Double click on one of the flags, you will see more about it!

These will also play an important role in our journey.

The bottom section is the FPU or Floating Point Unit records. These are used whenever the CPU performs any arithmetic involving decimal points. These are rarely used by inverters, especially when we come to cryptography.

STACK
Stack is a section of memory reserved for the binary as a "temporary" data list. This data includes pointers to addresses in memory, strings, bullets, and, most importantly, return addresses to the code to return when calling a function. When a method in a program calls another method, the control needs to be moved to this new method so that it can resume. The CPU should track the location of this new method. It’s called so that when this new method is done, the CPU can return to where it was called and continue running the code after the call. The Stack is where the CPU will hold this return address.

One thing, namely, about the stack is that it is a data structure "First In, Last Out".
The commonly used metaphor is one of those stacks of dishes in a diner that is spring-loaded. When you push a plate to the top, all the plates underneath are pushed down. When you remove ('pop') a dish off the top, all the plates below were level. We'll see this in action in the next article, so do not worry if it's a little hazy.

In this image, the first column is the address of each data member, the second column is the hex, 32-bit representation of the data, and the last column is Olly's comment on that data item, whether it can frame them out. If you notice the first line, you will see a comment "RETURN to kernel." This is an address that the CPU placed on the stack when the current function is made, so that it knows where to return. In Olly, you can right click on the stack and choose 'modify' to change the content.

Dump
At the beginning of this article, when we talked about the raw "opcodes" that the CPU reads inside a binary, I mentioned that you could see this raw data in a hexadecimal viewer. Well, in Olly, you do not have to. The dump is a built-in hexadecimal viewer that lets you view raw binary data, only in memory instead of activating the disk.

Usually, it shows two views of the same data; Hexadecimal and ASCII. These are represented by two columns from the right in the previous image (the first column is the address in memory that the data resides). Olly allows these representations of data to be changed.

Toolbar (or Toolbox)
Unfortunately, the Olly toolbar leaves a bit to be desired. We've rotated the left hand toolbar icons to help.

These are your primary controls for running code. Keep in mind that especially when you start using Olly, all of these buttons are also accessible from the "Debug" drop-down menu, so if you do not know what something is, you can look there.

Let's make some observations about some of the icons. "Reload" is basically to restart the application and pause it at the point of entry. All patches (see later) will be removed, some breakpoints will be disabled, and the application will still not run any code, well, most of the time anyway.

"Run" and "Pause" do just that.

"Step In" means to execute a line of code and then pause again, call a function call if there is one.

"Step Over" does the same, but jumps for a call to another function.

"Animate" is just like Step In and Over except that it does enough for you to watch (you probably will not use this, but sometimes it's fun to watch the code execute, especially if it's a polymorphic binary and you can watch the code change).

Then the (even more critical) window icons are:

Each of these icons opens a window, some of which you will use often, some rarely. Seeing how intuitive lyrics are, you too can do as I did and just start clicking on them until you find what you want. Each of them is also accessible in the "View".

(M)emory
The memory window displays all the memory blocks that the program has allocated, including the main sections of the running application (in this case, the "Showstr" items in the "Owner" column). You can also see many other sections below the list, these are the DLLs that the program loaded into memory and plans on using. If you double-click any of these lines, a window will open showing a disassembly (or hexadecimal dump) of that section. This window also shows the type of block, the access rights, the size and the memory address where the section is loaded.

(/)Patches
This window displays all the "patches" you have made, that is, any change in the original code. Note that the state is set to active; if you reload the application (by clicking the reload icon), those patches will become disabled. To reactivate them (or deactivate them), simply click on the desired patch and hit the spacebar. This turns the patch on/off.

Also note that in the "Old" and "New" columns, it shows the original instructions as well as the changed instructions.

(B)reakpoints
This window shows where all the current breakpoints are configured. This window will be your best friend.

(K)all Stack
This window is different from Stack and shows much more information about the calls being made in the code, the values sent to these functions and more. We'll see more about this soon.

CONTEXT MENU
For the last item of this article, we wanted to quickly present the right-click menu on Olly where lots of action happens, so you should at least be familiar with this. Right-clicking anywhere in the disassembly section results in the context menu appearing. We will only examine the most popular items now. As you gain experience, you will end up using some of the less commonly used options.

"Binary" allows editing binary data in a byte by byte level. This is where you can change an "Unregistered" string buried in a binary to "Registered". "Breakpoint" allows you to set a breakpoint. There are several types of breakpoints, which we'll see in the next article. "Search For" is a fairly large sub-menu, and is where you search the binary for data such as ropes, function calls, etc. "Analyze" forces Olly to re-examine the section of code you are currently viewing. Sometimes Olly gets confused whether you are observing code or data (remember, they are just numbers), so this forces Olly to consider where you are in the code and try to guess what this section should look like. Also note that our menu will look different from yours because I have some plugins installed and they add some functionality. Do not worry, we will go through all this in future articles.

CONCLUSION

This was the first part of our journey deep inside reverse engineering.
You have been introduced to reverse engineering, learned what reverse engineering is used for, what knowledge is needed to study it, and what kinds of tools are used in its process. In addition, we presented the OllyDbg tool, which will be used in the next articles.

From now on, the intention is in the next part to present another more "complex" way to better understand what is being done to perform the reverse engineering of software.

So, welcome to the world of reverse engineering! See you in the next part!

REFERENCES

  1. DANG, Bruce; GAZET, Alexandre; BACHAALANY, Elias; JOSSE, Sébastien. Practical Reverse
    Engineering. Wiley, 2013.
  2. EAGLE, Chris. The IDA Pro Book. No Starch Press, 2012.
  3. EILAM, Eldad. Reversing. Wiley, 2005.
  4. HYDE, Randall. The Art of Assembly Language. No Starch Press, 2010.
  5. LIGH, Michael Hale; CASE, Andrew; LEVY, Jamie, WALTERS, AAron. The Art of Memory Forensics.
    Wiley, 2014.
  6. LIGH, Michael; ADAIR, Steven; HARTSTEIN, Blake; RICHARD, Matthew. Malware Analyst's
    Cookbook. Wiley, 2011.
  7. MARCIANO, Leonardo( It's me \o/ ). Reverse Engineering # 1 - Beginning of a Great Adventure. Available in
    Portuguese language at:
  8. https://medium.com/@leonardomarciano/engenharia-reversa-1-in%C3%ADcio-de-uma-grandeaventura-9526447ee50e
  9. Deivison Pinheiro and Leonardo Marciano on eForensics Magazine 2018 03 Best Of Vol4

For verification purposes, I placed the link from my Steemit profile in my medium: https://medium.com/@leonardomarciano/

Sort:  

Congratulations @lstark! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published your First Post
You made your First Vote
You got a First Vote
Award for the number of upvotes received

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @steemitboard:

Presentamos el Ranking de SteemitBoard
Introducing SteemitBoard Ranking

Support SteemitBoard's project! Vote for its witness and get one more award!

Boa, uma hora entro no medium para ler o traduizido.
Agora só por em pratica tudo que está fervilhando em sua cabeça! ;)

Parabéns e seja bem vindo!

Congratulations @lstark! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You made your First Comment

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @steemitboard:

Presentamos el Ranking de SteemitBoard

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @lstark! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 1 year!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!