Don’t Use Objects in Python – part 2

In my previous post I explained that objects in python are really just special wrappers for dictionaries. There is one dictionary that contains the attributes of the object and another dictionary that contains the attributes of the class. Indeed we can really just think of the __init__ method as a static method that adds elements to a dictionary.

Objects in Python do however have one feature that you cannot get by just passing around dictionaries: inheritance.

Let’s suppose we are using some Arbitrary Object-Oriented Language. This language looks and works a lot like C#, Java or C++. We define two classes, Security and Bond in this language. Security has a single field, an integer, called Id:

public class Security
{
   public int Id;
   public Security(int id)
   {
        Id = id;
   }
}

Bond inherits from Security and adds another field, this time a double called Rate:

public class Bond : Security
{
    public double Rate;
    public Bond(int id, double rate) : super(id)
    {
        Rate = rate;
    }
}

What do we get by making Bond a subclass of Security? The most obvious thing we get is that Bond will get a copy of the implementation inside Security. So in this example, as Security has a public field called Id, Bond has one as well. We can think of Security as a user defined type that is built up out of primitive types. By using inheritance Bond extends this type.

In Python we can do something similar. First we define our two classes:

class Security:
    def __init__(self, id):
        self.id = id

and:

class Bond(Security):
    def __init__(self, id, rate):
        self.rate = rate
        Security.__init__(self, id)

Now, this time when we define Bond as a subclass of Security what do we get? Python objects are not composite types like in our object oriented language. In Python objects are really just dictionaries of attributes and these attributes are only distinguished by their names. So our bond class could have a numeric value in it’s Rate attribute, but it could also have a string value, or a list or any other type. When we subclass in python we are not extending a user defined type, we are just re-using some attribute names.

In our Arbitrary Object Oriented language, there is another advantage to making Bond a subclass of Security: the ability to treat Bond as a Security. To understand what this means, suppose we have a function that prints the ids of a list of Securities:

static void printPortfolio(Security[] securities)
{
    string ids = "";
    foreach(Security security in securities)
    {
        ids += (" " + security.id);
    }
    Console.WriteLine(ids);
}

Now, this function specifies it’s single parameter must be an array of Securities. However, by the magic of polymorphism, we can actually pass in an array of types that inherit from Security. When they are passed in they are automatically cast to type Security. This can be pretty useful, in particular it makes it a lot easier to reason about our code.

Now let’s define the same function in Python:

def print_portfolio(securities):
    ids = ""
    for security in securities:
        ids += (" " + str(security.id))
    return ids

On the face of it, this is very similar. However we are not really using polymorphism in the same way as we are in our object oriented language. We could actually pass a list of any objects into the print_portfolio function, and, as long as they had a id attribute, this would execute happily. We could, for example, define a completely unrelated class like so:

class Book:
    def __init__(self, id):
        self.id = id

and pass a list of these into our print_portfolio function without any problems. Indeed in Python we can dynamically add attributes to an object, so we could even create an empty class:

class Empty:
    def __init__(self):
        pass

and assign a id attribute to it at runtime, via:

e = Empty()
e.id = "Hello"

and then enclose it in a list [e] and pass it into the print_portfolio function.

There’s one more thing we get with inheritance in an object oriented language: access to protected members. When we mark a method or field as protected it will only be accessible from within objects of that type or types that inherit from it. However in Python there are no protected methods or fields, everything is public.

So there are three reasons why I don’t think inheritance makes sense in Python:

  • Python classes aren’t really composite types, so it doesn’t make sense to extend them
  • Inheritance doesn’t give us extra access to protected methods, as everything in python is public anyway
  • Inheritance in Python doesn’t give us the benefit of polymorphism because in Python there are really no restrictions on what objects we pass around

So, that is why there really is no extra benefit to using an object in Python rather than a dictionary.

Don’t Use Objects In Python – Part 1

Everyone knows Python is object oriented. It’s right there on on page 13 of introducing python, it says as much on Wikipedia. You might have even seen classes, objects, inheritance and polymorphism in python yourself. But is Python really object oriented?

Let’s first ask, what actually is an object in python? We’ll see for ourselves, by creating a simple python class:

class Record:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def is_over_eighteen(self):
        return self.age > 18

this is just a simple class that has two member fields, name and age and a single method is_over_eighteen. Let’s also create an instance of this class:

my_record = Record("Stubborn", 207)

In Python member fields and methods are known as attributes. We can lookup the attributes of our record just like in any other standard object oriented language by my_record.name, my_record.age and my_record.is_over_eighteen.

Our object also has other secret attributes that are created automatically. Their names start with a double underscore. The attribute we are interested in is my_record.__dict__. If we evaluate this we will see that it is a dictionary of the instance attributes:

{'name': 'Stubborn', 'age': 207}

What’s interesting is that this isn’t just a representation of the object as a dictionary. In python an object is backed by an actual dictionary, and this is how we access it. When we look up an attribute of an object with the normal notation, for example my_record.age, the python interpreter converts it to a dictionary lookup.

The same is true of methods, the only difference is that methods are attributes of the class. So if we evaluate: Record.__dict__ we get something like:

mappingproxy({'__module__': '__main__', '__init__': <function Record.__init__ at 0x7f3b7feec710>, 'is_over_eighteen': <function Record.is_over_eighteen at 0x7f3b7feec7a0>, '__dict__': <attribute '__dict__' of 'Record' objects>, '__weakref__': <attribute '__weakref__' of 'Record' objects>, '__doc__': None})

We can access our method from this dictionary via:

Record.__dict__["is_over_eighteen"](my_record)

So a Python object is really just a wrapper for two dictionaries. Which begs the question, why use an object at all, rather than a dictionary? The only functionality objects add on top of dictionaries is inheritance (more on why that is bad in a later post) and a little syntactic sugar for calling methods. In fact there is one way in which python objects are strictly worse than dictionaries, they don’t pretty print. If we evaluate print(my_record) we will see:

<__main__.Record object at 0x7f3b8094dd10>

not very helpful. Now there is a way to make objects pretty print. We do this by implementing the __repr__ method in our Record class. This is the method used by print to represent an object as a string. However, dictionaries already have __repr__ defined so they will pretty print automatically.

Fundamentally, if you think you want to define a class in python, consider just using a dictionary instead.

Assembly Tutorial – I/O Bringing it all Together

We’ve seen in previous posts how to handle errors when writing to files and how to read and write arbitrary numbers of bytes to files. It’s time we put this all together! We are going to write a program that will read an arbitrary number of bytes from the command line and write them to a file. If our program encounters any errors it will gracefully exit with code 1. Here’s the code:

.equ BUFFER_SIZE, 20
.equ NEW_LINE, 10

.section .data
filename:
.string "outputfile\.txt"

.section .bss
.lcomm buffer_data, BUFFER_SIZE

.section .text

.globl _start
_start:

movq $2, %rax
movq $filename, %rdi
movq $0x441, %rsi
movq $0666, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %r9

read_from_buffer:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

movq $1, %rax
movq %r9, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

jmp read_from_buffer

exit:
movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:
movq $60, %rax
movq $1, %rdi
syscall

Nothing we have done here is really new. We start by defining some constants and a 20 byte buffer. Then, in the text section, we open the file named “outputfile.txt”.

movq $2, %rax
movq $filename, %rdi
movq $0x441, %rsi
movq $0666, %rdx
syscall

When we open the file we use the flag value 0x441. This flag tells the kernel three things: that we want to open the file in write mode, that we want to create a file if it doesn’t exist and that we want to append to the end of a file if it does exist.

After the open system call, we check if the return value of this system call is negative:

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

If so, we jump straight to exit_with_error, otherwise we stash the return value in r9. You have to be careful to stash values you want to keep somewhere they won’t get over written during the next system call. We don’t use rsi, rdi or rdx as we are using to pass values to the kernel. The registers rcx and r11 will, in general, have their values overwritten during a system calls and rax will contain return values. So we choose rbx and r9 as our two stash registers.

Now, once we’ve opened our file, we enter a loop. Our loop starts with the label read_from_buffer. As before, at the end of each loop we check if the last character we have read is a new line and if so, jump to the exit, otherwise we jump back to the loop start:

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

jmp read_from_buffer

Inside this loop we have our read and write system calls:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

movq %rax, %rbx

movq $1, %rax
movq %r9, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

cmpq $0, %rax
jl exit_with_error

We now have a conditional jump statement after each of these calls. If control returns from either with a negative number in rax we jump straight to exit_with_error.

That’s it, we’ve covered everything we will need to handle input and output in x86 assembly! It’s been quite a journey. In our next few posts we’ll be trying something slightly different.

What is Single Inheritance, and Why is it Bad?

In a previous post we talked about how there are two different notions of inheritance, Interface Inheritance and Implementation Inheritance. In this post we’ll be talking primarily about the latter. That means, we’ll be talking about using inheritance to share actual logic among classes.

Let’s say we defined two base classes. The first is DatabaseAccessor:

class DatabaseAccessor {
   protected:
   bool WriteToDatabase(string query) {
   // Implementation here
   }

   string ReadFromDatabase(string query) {
   // Implementation here
   }
}

which encapsulates our logic for reading from and writing to some database. The second is Logger:

class Logger {
   protected:
   void LogWarning(string message) {
   // Implementation here
   }

   void LogError(string message) {
   // Implementation here
   }
}

this encapsulates our logic for logging errors and warnings. Now, suppose also that we are using a language the supports multiple inheritance, such as C++. If we want to define a class that shares functionality with both of these classes, we can subclass both of them at once:

class SomeUsefulClass : private DatabaseAccessor, private Logger {
   // Some useful code goes here
}

So, we have a subclass named SomeUsefulClass that can use both the DatabaseAccessor and Logger functionality, great!

What if we wanted to do the same thing in a language like C# or Java? Well then we’re out of luck, in C# and Java you cannot inherit from more than one base class. This is called single inheritance. How do we achieve the same effect if we are using either of these languages? Well, one solution would be to chain our subclasses together. We would choose which of DatabaseAccessor and Logger is more fundamental and make it the subclass of the other. Suppose we decided that everything has to log, then the Logger class remains the same and DatabaseAccessor becomes:

class DatabaseAccessor : private Logger {
   protected:
   bool WriteToDatabase(string query) {
   // Implementation here
   }

   string ReadFromDatabase(string query) {
   // Implementation here
   }
}

So now we can subclass DatabaseAccessor and get the Logger functionality as well. Although, what if we really did just want the DatabaseAccessor logic, and we didn’t actually need the Logger implementation? Well tough luck. Now everything that inherits from DatabaseAccessor inherits from Logger as well.

This might not seem like much of a problem with something as trivial as Logger, but in big enterprise applications, it can spiral out of control. You end up in a situation where all the basic re-usable code you might need is locked inside a lengthy chain of subclasses. What if you only need something from the bottom ring of this chain? Unfortunately, you have to pick up all of it’s base classes as well. This makes our code unnecessarily complicated and harder to read and understand. If one of those unwanted base classes does all kinds of heavy weight initialisation on construction, then it will have performance implications as well.

One consolation that is often offered is that both languages allow multiple inheritance of interfaces. This doesn’t really help us though. Implementation inheritance and Interface Inheritance are two completely different things. We can’t convert one of DatabaseAccessor or Logger into an interface. The entire reason we want to inherit from them is to get their implementation!

We could also use composition instead of inheritance. In this case we would inherit from one of the classes, and keep a reference to the other. But, in that case, why even use single inheritance at all? Why not just let our classes keep references to a Logger and a DatabaseAccessor? The language designers have struck a bizarre compromise here, we can use inheritance, but only a little bit. If C# and Java are Object Oriented, then they should allow us to use the full features of object orientation, rather than just a flavour.

The good news is that the people behind C#, Microsoft, have realised the error in their ways. They have released two features that ameliorate the problem of single inheritance, Extension Methods and Default Implementations in Interfaces. More on these in future blog posts.

The Two Different Types of Inheritance

If, like me, you started life as a C++ developer, you will be surprised to learn that there are actually two completely different notions of Inheritance. In C++, these two different notions get squashed together, but they’re still there.

Imagine we are defining a base class to represent a financial security. In C++ it might look something like this:

class Security {
   public:
   virtual double GetPrice(double interestRate) = 0;
}

Any class that inherits from this base class has to implement the GetPrice method. Apart form that, they may have nothing else in common.

Now, suppose we are writing a base class that encapsulates our database access logic. Again, in C++, it might look a little like this:

class DatabaseAccessor {
   private:
   bool WriteToDatabase(string query) {
   // Implementation here
   }

   string ReadFromDatabase(string query) {
   // Implementation here
   }
}

These two examples illustrate the two different notions of inheritance, interface inheritance and implementation inheritance.

Interface inheritance, also known as type inheritance, is when we use inheritance to specify a common api across different classes. In our first example, every class that implements the Security class will have a public GetPrice method that takes a double argument and returns a double. There can be lots of different Security subclasses, that all calculate their price differently. But they all share the same interface for this functionality. This is interface inheritance.

Now, let’s look at our second example. Any class that inherits from DatabaseAccessor will have a shared implementation for reading from and writing to databases. We are using inheritance here to make our code easier to maintain and re-use. This is implementation inheritance.

When we write C++ we don’t normally distinguish between these two things. Indeed we can mix them freely. For example, suppose we had another class:

class RateCurve {
public:
   virtual double GetRateAt(time_t date);
   double Discount(double value, time_t date) {
      return value * GetRateAt(date);
   }
}

The method GetRateAt defines an interface as it’s a virtual function with no implementation. The method Discount comes with an implementation that will be shared between subclasses. So whenever we inherit from RateCure we will be using interface and implementation inheritance at the same time!

In a language like C# things are quite different. Interface and Implementation inheritance are handled explicitly and separately. If you would like to share implementation you inherit from a class. If you would like to use a common API you implement an interface. Interfaces are defined like so:

public interface ISecurity {
   public double GetPrice(double rate);
}

An interface cannot contain any fields, and (at least up to C# 8.0) interface methods cannot have implementations. This all means that it is a little bit easier to reason about our code. The disadvantage is that we cannot use both interface and implementation inheritance at the same time.

Assembly Tutorial – Input and Output the Right way

Up until now, whenever we’ve read from or written to a file, we’ve just put an upper bound on the number of bytes we were reading or writing. For example in our original simple echo program we used a buffer of 500 bytes, and we put the value 500 into the rdx buffer when making the read and write system calls. If we carry on this way, we’ll always have to put a maximum size on input and output. Let’s learn how to do this properly!

We are going to write another simple echo program. However, this time, we’ll use a loop to read and write the input. We’ll also use a register to store a memory address like a pointer.

Ok, let’s look at some code:

.equ BUFFER_SIZE, 20
.equ NEW_LINE, 10

.section .data
.section .bss
.lcomm buffer_data, BUFFER_SIZE

.section .text

.globl _start
_start:

read_from_buffer:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

movq %rax, %rbx

movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

jmp read_from_buffer

exit:
movq $60, %rax
movq $0, %rdi
syscall

The first two lines of this program introduce some new syntax. The equ keyword allows us to define a constant that will be substituted by the assembler. This is just like the #define pre-processor directive in C or C++. Here we define two constants:

.equ BUFFER_SIZE, 2o
.equ NEW_LINE, 10

The first, BUFFER_SIZE, is the size of the buffer we will be using, in this case 20 bytes. The second, NEW_LINE is just the ascii character code for a newline. Defining constants like this makes our code more readable and maintainable. Next, in the bss section we define a buffer named buffer_data of length buffer_size.

Now we have the meat of our program: a loop that starts with the label read_from_buffer. Inside this loop we have a read system call, a write system call, and a conditional jump.

The read system call:

movq $0, %rax
movq $0, %rdi
movq $buffer_data, %rsi
movq $BUFFER_SIZE, %rdx
syscall

reads BUFFER_SIZE worth of data from stdin to our buffer buffer_data. When control returns from the read system call the kernel will leave a return value in the register rax. This value will either be the number of bytes that the kernel read or a negative number indicating an error. For now, we ignore the error case. So, we move the value in rax into rbx to save it. Then we perform a write system call:

movq $1, %rax
movq $1, %rdi
movq $buffer_data, %rsi
movq %rbx, %rdx
syscall

The only new point here is that we move the value in rbx into rdx. This means we only ask the kernel to write the number of bytes that were actually read. Now, we do a conditional jump:

decq %rbx
cmpb $NEW_LINE, buffer_data(,%rbx,1)
je exit

We are checking to see if the final character we have read from stdin is a newline. The register rbx, contains the number of bytes we have read. So, to get the index of the last byte that we read, we decrement it. Then we use index addressing mode, buffer_data(,%rbx,1), to get the value of the last byte that we have read. This tells the cpu to read the value in rbx and count that many bytes past the start of buffer_data and load the value it finds. We compare this value with the ascii value for a newline. If the final character was a newline, we jump to the usual exit with code 0. Otherwise, the next instruction is the unconditional jump jmp read_from_buffer which brings us back to the start of the loop.

When we assemble, link and run this code, once it hits the first input system call, the shell will prompt the user for input on the command line. Suppose the user enters some text and hits enter. The kernel stores this text in the stdin file. In our system call we only asked for 20 bytes, so the kernel copies (at most) 20 bytes into our buffer, and discards them from stdin. The rest of the text that was input persists in stdin. Once we’ve written these bytes to stdout we can go back and read the next chunk from stdin. However, the user only gets prompted once, even though our code reads from the stdin file multiple times.

So now we know how to read and write input the proper way!

Book Review – Functional Programming in C#

I read Enrico Buonanno’s Functional Programming in C# directly after Jon Skeet’s C# in depth. Unlike Skeet’s book, this is not a book about C#, it is a book about functional programming which uses C# is the medium of instruction.

What makes this particularly interesting is that C# isn’t really a functional language. Usually functional programming is discussed in terms of niche explicitly functional languages like Haskell. By using C#, Buonanno makes the principles and practices of functional programming a lot more accessible to the average programmer. He does advocate mixing the more traditional object oriented style of C# with functional programming, but this is not something that really comes through in the code samples.

This book covers the basic concepts of functional programming really well. He explains how and why to avoid state mutation, the concept of functions as first class citizens and higher order functions, function purity and side effects, Partial application and currying and lazy computation. One thing that interested me particularly is the good case the author makes that pure functions should not throw exceptions.

He does not spend a long time justifying functional programming. The two main benefits he repeatedly highlights are easier testing and better support for concurrency. It is clear that a functional approach is more suited to concurrent programming. However the claim that functional code is easier to test seems somewhat dubious to me, and is not really backed up with credible examples. He also claims that functional programming leads to cleaner code. Again, I am somewhat skeptical.

I really enjoyed the discussion of user defined types. In particular the pattern of creating new types that wrap low level types and add some semantic meaning. A great example he uses is an age type. This has only one member, an integer. It’s constructor will only accept valid values for human ages. This means that we have added an extra layer of static type checking to this type: when we want an age, we use the age class, and not just an integer. It also means we don’t have to perform extra checks when using an age type, as we know it’s value was checked when it was initialised.

The material on LINQ was also quite good. LINQ is strongly functional in style, it emphasizes data flow over mutation, composability of functions and pureness. This is all well explained, the author even shows how to integrate user defined monads into LINQ. However LINQ is not dealt with in a single place in a systematic way, which is disappointing. Another gripe I have is that the author uses LINQ query syntax. I do not like query syntax. Not only is it ugly, it is usually incomprehensible.

Buonanno uses Map, Bind and Apply to introduce functors, monads and applicatives in a very practical way. He also goes into detail explaining how and why to use the classic monads Option and Either. We even see variations on the Either monad that can be used for error handling and validation. Both the applicative and monadic versions of Traverse appear near the end of the book as well. In my opinion, Monad stacking is one of the worst anti-patterns of functional programming. So it is disappointing that it only gets very limited coverage. I would also have liked if he had used his excellent examples as a jumping off point to dig deeper into category theory. Perhaps however, that would have been too abstract for what is quite a practical book.

Handling state in a functional way is covered well, but it feels a bit academic. It is hard to imagine applying the patterns he covers in a real world code base. Sometimes you just have to use state! The final few chapters cover IObservables, the agent model and the actor model. These sections were quite interesting but felt a little out of place. They really merit a much deeper dive.

When I finished this book I had a much greater appreciation for functional programming. In particular it gave me lots of ideas of how I could practically apply it in my real life work. The code samples were all very good, occasionally though they were a little convoluted. Indeed they sometimes seemed like evidence against a functional style. But, as someone relatively new to functional programming I appreciated how grounded it was in real world examples. Overall, this book is an excellent resource for C# programmers who want to add a little functional flourish to their code. I highly recommend it.

Assembly Tutorial – Looping

We’re going to write a simple program that demonstrates how to loop in assembly. We won’t be using a direct loop construct like in a higher level language. Instead, we’ll be using the jump and comparison instructions we covered in the a previous post.

We can loop infinitely over a block of code in assembly using a label and an unconditional jump:

loop_start:
### code that get's looped over 
jmp loop_start

Usually we don’t want an infinite loop in our code. So we put a conditional jump inside the loop that jumps to a label after the loop ends. Let’s have a look at an example. We’re going to write some code that uses a loop to print 10 asterisks to the terminal and a new line and then exits.

.section .data
asterick: .byte 0x2A
newline: .byte 0xA

.globl _start
_start:

movq $0, %rbx

loop_start:

movq $1, %rax
movq $1, %rdi
movq $asterick, %rsi
movq $1, %rdx
syscall

incq %rbx

cmpq $10, %rbx
jge exit

jmp loop_start

exit:

movq $1, %rax
movq $1, %rdi
movq $newline, %rsi
movq $1, %rdx
syscall

movq $60, %rax
movq $0, %rbx
syscall

In the data section of this code we declare two separate bytes in memory. The first byte is labelled ‘asterick’ and has hex value 2A (the hex value of an asterick). The second is label ‘newline’ and has hex value A (the hex value for a new line).

Then we have our loop:

movq $0, %rbx

loop_start:

movq $1, %rax
movq $1, %rdi
movq $asterick, %rsi
movq $1, %rdx
syscall

incq %rbx

cmpq $10, %rbx
jge exit

jmp loop_start

In this loop we are using the register rbx as our loop counter, so we begin by setting it to 0. Then we have the usual system call to write to stdout. We give the kernel the memory address of the byte in memory that contains the hex code for an asterisk. There is an important point here. The write system call takes a memory address not a value. If we want to print an asterisk, we cannot just pass it the hex value for an asterisk, we have to pass it the memory address of a byte containing an asterisk.

Once we have performed this system call, we must increment our counter. We do this with the instruction:

incq %rbx

This instruction does a 64 bit increment of the value in the register rbx. incq is one of the special instructions we can use to increment and decrement register values. They come in the usual instruction size variations. The instructions incq, incl, incw and incb increment 8 bytes, 4 bytes, 2 bytes and 1 byte respectively. Similarly the instructions decq, decw, decw and decb decrement 8 bytes, 4 bytes, 2 bytes and 1 bytes respectively.

Once we have incremented our counter, we check if the value is greater or equal to 10. If the value in the register rbx was less than ten we move straight to the next instruction:

jmp loop_start

which jumps back to the start of the loop. Notice that we jump back to the next instruction after we set up our loop counter in rbx. If we had put the loop start label one instruction earlier, our loop would run indefinitely, because the counter would have reset to 0 on every iteration.

If however, our counter in rbx is greater or equal to 10 we jump straight to the labelled exit section:

exit:

movq $1, %rax
movq $1, %rdi
movq $newline, %rsi
movq $1, %rdx
syscall

movq $60, %rax
movq $0, %rbx
syscall

This section prints a new line and then exits with exit code 0 as usual. We now know how to do conditional branching and looping in assembly!

C# In Depth – Book Review

Jon Skeet is a bit of a legend. He has the highest reputation score on Stack overflow. He got there because of his consistently patient, helpful and correct answers. He is probably the most prominent C# developer there is. So, when I started a new job as a C# developer, I decided to read Skeet’s book, C# in Depth.

It’s a very good book. There is one big problem however, the structure. This book is divided into five parts, each dealing with a successive major numbered release of C#. This chronological structure is quite strange. The overriding assumption of the author is that the reader is familiar with C# 1. Given that C# 2 was released 13 years ago, this is a pretty strange angle. It’s hard to imagine there are many programmers today who are familiar with C# 1 but need a detailed walk through of the new additions to the language in C# version 2 to 5. The book ends up a sort of mix between a history of C# and an intermediate user’s guide.

One example of the problem with this structure is how delegates are covered. They are first introduced briefly in chapter 1. Improvements to the delegate syntax in C# 2 are then covered in detail in chapter 5. In neither of these chapters is there a clear explanation of what delegates actually are or why they are part of C#. Indeed when we reach chapter 10 Skeet covers lambda expressions, which, in reality, make delegates redundant for most use cases.

Another victim of the unorthodox structure is the coverage of class properties. Modern C# syntax allows us to define properties in a very quick intuitive manner. In this book, first we learn about properties as they originally appeared in C# 1. Then, in chapter 7, we see how C# 2 allowed a mix of public getters with private setters. Finally in chapter 8 we see how properties are actually implemented in modern C#.

There is of course a benefit to covering older versions of the language in detail. C# is a language designed for enterprise development. So, if you code in it, you are likely to be working with a large legacy code base. This means that understanding what the language looked like in it’s various iterations is useful. However, these topics would be a lot better served if they were covered all at once, rather than being split over multiple chapters.

Skeet spends a lot of time covering Linq, which is great. Linq is a really cool feature of C#, and he covers cool details, like how to use extension methods and iterators to integrate your own code into LINQ. He also covers the query expression Linq syntax. This is the syntax that lets your write a linq expression in the style of a SQL query. Frankly I think Linq expression syntax is a monstrosity and should never be used, but it is probably useful to cover it, and explain how it works (it’s really just syntactic sugar for the normal linq syntax). There is also a useful section on async code, that gets into a lot of really useful detail.

Overall, Skeet has an ability to make some quite obscure topics interesting and accessible. He always presents new ideas with realistic and useful code snippets. Most important of all, he writes in a fun conversational style, that makes reading his book a lot more fun than a typical intermediate language guide.

Assemly Tutorial – Register Access and Pointers

So far we have used registers in two distinct ways. The first way is when we load values into registers and compare them to other values, like in the following code:

movq $4, %rax
movq $3, %rbx
cmpq %rbx, %rax

We’ve also used registers to store memory addresses when we used system calls. For example if we wanted to write the 50 bytes from the buffer named data_buffer to stdout we would use code like the following:

movq $1, %rax
movq $1, %rdi
movq $data_buffer, %rsi
movq $50, %rdx

In the third line, $data_buffer, is the address of the buffer, so when we make the system call, the register contains an address for the data we are interested in rather than the data itself.

We can use registers like this more generally. Indeed, to access the value stored at the memory location contained in a register we wrap the register name in brackets, as below.

cmpq $0, (%rax)

In the above code, if the value stored in rax is an accessible address in memory, that contains a value equal to zero, then the above condition is true. If rax contains an accessible address in memory that contains a value other than zero the above condition is false. If the register rax contains the address of a region of memory we cannot access, for example the region before the instructions, the we will get a segmentation fault when our program runs.

We can also offset the address in a register by placing a constant value in front of the brackets like so:

cmpq $0, 8(%rax)

this value can be positive or negative and is specified in bytes.

Often however, we want to dynamically calculate addresses in our code, we do that with indexed addressing mode. This allows us to provide a constant base address, a constant multiplier, and two registers representing an offset and a multiplier. Specifically,

data_buffer(%rax, %rbx, 2)

refers to the memory address found when you start at the address of data_buffer, add the value contained in rax and 2 times the value in rbx (with all numeric values specified in bytes). Unfortunately you cannot use a negative multiple here. This addressing mode is particularly useful when we are iterating through strings or arrays of contiguous memory.