{ inercia }

Sep 26

Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming -

On Linux 3.17, the Linux kernel has gotten an implementation of “relativistic hash tables” that can be resized while lookups proceed concurrently

"Relativistic" refers to the fact that the relative timing of two events (hash table insertions, say) that are not causally related may appear different to independent observers. In other words, one CPU may see two items inserted into the table in one order, while those insertions appear to have happened in the opposite order on another CPU. Despite some interesting performance results, this technique did not find its way into the kernel until the 3.17 merge window opened

There is a nice introduction in Linux Weekly.

Sep 25

Rosetta Code Analysis -

Our statistical analysis reveals, most notably, that: functional and scripting languages are more concise than procedural and object-oriented languages; C is hard to beat when it comes to raw speed on large inputs, but performance differences over inputs of moderate size are less pronounced and allow even interpreted languages to be competitive; compiled strongly-typed languages, where more defects can be caught at compile time, are less prone to runtime failures than interpreted or weakly-typed languages.

The full PDF article can be found here.

Aug 14

Knot DNS -

Knot DNS is a high-performance authoritative-only DNS server which supports all key features of the domain name system including zone transfers and DNSSEC.

This review in LWN about the Knot DNS server shows the impressive list of features found on this server, highlighting the performance advantage when compared to other popular servers like BIND or PowerDNS.

Panamax: Docker Management for Humans

Aug 13

Presentation: "The Disruptor: High Performance Inter-Thread Messaging" by Michael Barker

Aug 12

liburcu: user space RCU -

liburcu is a library that implements RCU synchronization in user space. RCU is a synchronization technique very popular by being used at many places in the Linux kernel, where it provides a very simple API.

liburcu “provides read-side access which scales linearly with the number of cores. It does so by allowing multiples copies of a given data structure to live at the same time, and by monitoring the data structure accesses to detect grace periods after which memory reclamation is possible.”. liburcu also provides some lock-free data structures like queues, hash tables and double-linked lists…

Efficient Maximum Flow Algorithms -

For the maximum flow example, suppose we have a graph that represents an oil pipeline network from an oil well to an oil depot. Each arc has a capacity, or maximum number of liters per second that can flow through the corresponding pipe. The goal is to find the maximum number of liters per second (maximum flow) that can be shipped from well to depot. For the minimum cut problem, we want to find the set of pipes of the smallest total capacity such that removing the pipes disconnects the oil well from the oil depot (minimum cut).

Maximum flow algorithms play an important role in computer networks, where bandwidth can be a scarce resource and this kind of optimization can be used for scheduling data transfers. This article show the state of the art in algorithms for solving maximum flow problems.

Aug 11

An overview of the different Linux performance tools and the subsystems they work on.

An overview of the different Linux performance tools and the subsystems they work on.

Aug 08

Go for Pythonistas -

A great introduction to Golang for Python programmers…

Broken by Design: MongoDB Fault Tolerance -

A devastating article about Mongo’s fault tolerance and reliability. See this other article from the same author.

I agree on the main point in the article: Mongo is not designed for fault tolerance at all. Mongo’s replication mechanism is very rudimentary, but it is designed in that way for achieving higher performance. However, the article is based on the assumption that “all data is critical” and I completely disagree with this. Many software designs are built on top of data that can be lost with minimal effects.

Imagine a user notification system that sends emails to users and where delivery reports are stored in a Mongo database. The worst thing that could happen if a delivery report is not properly saved is that the email is sent twice. Is that really important? No if what you want is speed and you care more about sending millions of notifications per minute.