Posts

Showing posts from September, 2015

Time on multi-core, multi-socket servers

In Distributed Computing the notion of "when-ness" is fundamental; Lamport's " Time, Clocks, and the. Ordering of Events in a Distributed System"  paper is considered one of the foundational pieces of work. But what about locally? The conclusion is that except for the special case of using nanoTime() in micro benchmarks, you may as well stick to currentTimeMillis() —knowing that it may sporadically jump forwards or backwards. Because if you switched to nanoTime(), you don't get any monotonicity guarantees, it doesn't relate to human time any more —and may be more likely to lead you into writing code which assumes a fast call with consistent, monotonic results. Details: http://steveloughran.blogspot.ca/2015/09/time-on-multi-core-multi-socket-servers.html 

HDFS Erasure Coding in Apache Hadoop

HDFS by default replicates each block three times. Replication provides a simple and robust form of redundancy to shield against most failure scenarios. It also eases scheduling compute tasks on locally stored data blocks by providing multiple replicas of each block to choose from. However, replication is expensive: the default 3x replication scheme incurs a 200% overhead in storage space and other resources (e.g., network bandwidth when writing the data). For datasets with relatively low I/O activity, the additional block replicas are rarely accessed during normal operations, but still consume the same amount of storage space. Therefore, a natural improvement is to use erasure coding (EC) in place of replication, which uses far less storage space while still providing the same level of fault tolerance. Under typical configurations, EC reduces the storage cost by ~50% compared with 3x replication. Motivated by this substantial cost saving opportunity, engineers from Cloudera and Intel

3D Computer Chips

Image
A new method of designing and building computer chips could lead to blisteringly quick processing at least 1,000 times faster than the best existing chips are capable of, researchers say. The new method, which relies on materials called carbon nanotubes, allows scientists to build the chip in three dimensions. The 3D design enables scientists to interweave memory, which stores data, and the number-crunching processors in the same tiny space, said Max Shulaker, one of the designers of the chip, and a doctoral candidate in electrical engineering at Stanford University in California. Details: https://www.livescience.com/52207-faster-3d-computer-chip.html

Debunking Myths About the VoltDB In-Memory Database

Myth #1: “VoltDB requires stored procedures.” This was true for 1.0, but no one seems to notice it’s been false since we shipped 1.1 in 2010. VoltDB supports unforeseen SQL without any stored procedure use. We have users in production who have never used a single stored procedure. Myth #2: “VoltDB doesn’t support ad-hoc SQL.” This is just a rephrasing of Myth #1 and is still false. Myth #3: “VoltDB is slow unless I use stored procedures.” Well, no. VoltDB can run faster with stored procedures, but it’s still fast if they are not used. In our internal benchmarks on pretty cheap single-socket hardware, we can run about 50k write statements per second, per host with full durability. Myth #4: “I have to know Java to use VoltDB.” As of VoltDB 3.0, released over a year ago, (we’re on V4.2 today), a user can build VoltDB apps and run the server without ever directly interacting with the Java CLI tools or any Java code. Myth #5: “VoltDB has garbage collection problems because it is wri