Merge with elegance

It’s perhaps the last time I asked this interview question last week. Write a function to return a sorted array containing all elements from two given int arrays. The input arrays are sorted but omission is intentional. Bonus points for:

  1. asking if input array is already sorted?
  2. identifying that it’s the merge step of merge-sort algorithm
  3. or naming the function merge

I have seen correct, clever, dumb, outrageous, uncomfortable and painful responses. For those who provide good answers, I ask them to identify Queue operations and think of an elegant solution instead of the fastest solution. Here is a stunningly elegant implementation of merge operation which treats input and output arrays as Queues.


#ignore mutability of a, b
#first is peek operation and shift is dequeue operation 
def merge(a, b)
  c = []
  c << (a.first < b.first ? a.shift : b.shift) until 
      a.empty? || b.empty? 
  c += a
  c += b
end

Java solution in 7 lines of code:

public static int[] merge(int[] a, int[] b) {
  Q cq = new Q(new int[a.length + b.length]);
  Q aq = new Q(a);
  Q bq = new Q(b);

  while(aq.notEmpty() && bq.notEmpty()) 
      cq.enqueue(aq.peek() < bq.peek() ? aq.dequeue() : bq.dequeue());
  while(aq.notEmpty()) cq.enqueue(aq.dequeue());
  while(bq.notEmpty()) cq.enqueue(bq.dequeue());
  
  return cq.elms;
}

//You will need a helper Queue class called Q to go with it. 
class Q {
    int current;
    int[] elms;

    Q(int[] elms) {
        this.elms = elms;
    }

    boolean notEmpty() {
        return current = elms.length;
    }

    int peek() {
        return elms[current];
    }

    int dequeue() {
        return elms[current++];
    }

    void enqueue(int elm) {
        elms[current++] = elm;
    }
}

Near-Far Cache Architecture

Near-far cache architecture is a robust architecture for deploying cache tier for web applications with following features.

  1. Improve Performance
  2. Protect data tier and do it reliably
  3. Cache consistency

Deploying memcached on few nodes and keeping round trip time (RTT) from web/app servers to memcached servers Θ(sub millisecond) can easily improve the performance of web apps. So, without much work, we can easily improve the performance and meet our first goal. These nodes make the Far Cache.

Will far cache reliably protect data tier? Perhaps not. A digged, slashdotted, techcrunched blog post may create hot-spots in certain memcached nodes. Memcached response slowdown may cause requests to backlog and some memcached nodes are likely to fall and network switches may saturate. Hot memcached node failures may expose data tier and there is never one cockroach.

To solve hot-spot problem, we introduce near caches. A near cache is a memcached node running on the web server or app server itself. We also cache a key in near cache and if we get a near cache miss, we fetch it from far cache and put into near cache. It prevents hot-spots from forming in the far cache. But wait, we just introduced another problem. The moment we start putting data into near cache, we run into cache consistency issues and user may observe the inconsistency. With near cache, inconsistency can’t be eliminated without sacrificing scale out, but we can provide user observed consistency. This can be done by near caching only for few seconds — it works just the way alternating current works. If near cache timeout is smaller than user page round trip time plus user thinking time, then a user will not observe inconsistency introduced by near cache. 3 seconds near cache timeout is a sweet spot for us and guarantees that far cache will get 20 hits per minutes per cache key per web or app server and only user visiting next link within 3 seconds may see inconsistent data in some cases when far cache is being refreshed. We may not have impressive cache hit ratio for near cache but it protects far cache and network and provides soft user observed consistency. With some domain specific tweaking of cache keys and dynamic near cache timeout, we get 85% near cache hits.

What is the timeout for far cache? Obviously, it should be grater than near cache timeout and if the data tier can’t handle troff traffic, then it must be greater than wavelength of the traffic cycle which is typically 24 hours. If you don’t want to charge users for far cache miss as much as possible, then far cache data for a long time (a week or so) and have an application level timeout by wrapping cached values.


class CacheValue {
Date last_updated;
Object value;
}

If the far cache value expires according to application level timeout, then we still serve the data from far cache but issue a background update. Keep application level timeout Θ(minutes) and Far cache timeout Θ(weeks).

New Relic RPM to monitor memcached

New Relic Infrastructure Agent now can be used to monitor memcached farm. It runs stats on configured memcached nodes periodically and ships data to New Relic RPM. The stats are reported  for each node and also aggregated stats are reported for all nodes. Check out the source at github.

Bulletproof debugging method to debug Java remote debugging

JVM remote debugging can be sometime tricky and elusive — especially when the process is running as a service. How can you separate JVM issues from other factors such as process running as service as a different user, firewall blocking debug port. One way to separate these is to know a bulletproof configuration for JVM remote debugging. Without further ado, here it is:

-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

Note that this only applies to Java 5 and above. If you are still in stone ages, hopefully you are not reading this blog either. This is great but how can I test it quickly? I still can’t connect. My server runs headless Linux environment, and I don’t have access to my pretty IDE. One method to use telnet.

telnet silly.example.com 5005

It works but there is something better. Command line jdb and its little known use to connect to remote process. You can also use it do debugging if you like.

/usr/java/current/bin/jdb -connect com.sun.jdi.SocketAttach:hostname=silly.example.com,port=5005
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb …
> exit

Oracle Hierarchical Query in Hibernate

When you need them, there is no replacement of Oracle hierarchical queries. They scale linearly with each level while a join would scale exponentially. At the same time, we would like Hibernate to do the hard work of mapping result set to entities. Here is an example of Hibernate Native Query which would fetch the organization chart from famous scott.emp table.

Session s = (Session) em.getDelegate();
String orgChartQuery = "select {emp.*} from EMP {emp} \n" +
"start with emp.mgr is null \n" +
"connect by prior emp.empno = emp.mgr\n" +
"order siblings by ename";
@SuppressWarnings("unchecked")
List list = s.createSQLQuery(orgChartQuery)
.addEntity("emp", Employee.class)
.list();

Now suppose we want to render this in a tree control with breadcurms. We can use the following query.


Session s = (Session) em.getDelegate();
String orgChartQuery = "select sys_connect_by_path(emp.empno, '/') breadcrum,\n"+
"level, {emp.*} from EMP {emp} \n" +
"start with emp.mgr is null \n" +
"connect by prior emp.empno = emp.mgr\n" +
"order siblings by ename";
@SuppressWarnings("unchecked")
List list = s.createSQLQuery(orgChartQuery)
.addScalar("breadcrum", Hibernate.STRING)
.addScalar("level", Hibernate.INTEGER)
.addEntity("emp", Employee.class)
.list();
for (Object[] objects : list) {
String breadcrum = (String) objects[0];
Integer level = (Integer) objects[1];
Employee emp = (Employee) objects[2];
System.out.println(breadcrum +" "+ level + " "+emp.getName()
+ " "+emp.getDepartment().getName());
}

Breadcrums willl give you path to the object from the root and level gives the 1 based level in the tree.

Handling Date with Google Gears

Suppose you execute the following DML statement in Google Gears:
db.execute(
"insert or replace into mytable "+
"(id, modified_on) values (?, ?)",
[null, new Date()]
);

This seemingly works but you are in for a surprise because Date is stored as Date.toString(). SQLite ducktying brain can’t infer that it’s a date datum. If you ran select query you will get string back.
Sat Jan 12 2008 02:34:58 GMT-0800 »
(Pacific Standard Time)

The downside of storing date as text that you can’t use any SQLite date time functions in your queries. Due to manifest typing, SQLite happily stores text in a date column. We would like to do better. We would like to store JavaScript Date as native SQLite Date without any loss of information. SQLite itself stores date in Julian Day representation while JavaScript cannonical date representation is number of milliseconds since 1 January 1970 00:00:00 UTC. So, we need to convert JavaScript Date to either Julian Day or text as understood by SQLite. Considering the ease of implementation, efficiency and lossless representation, YYYY-MM-DDTHH:MM:SS.FFF date format seems to fit the bill. This is ISO8601 format. However after some digging in SQLite code, it turns out that if it SQLite also works even if we don’t zero pad fields. So this simple JavaScript method with minor departure from ISO8601 format does the job well.
dateToSQLiteFormat = function(date) {
return date.getUTCFullYear()+
"-"+date.getUTCMonth() + 1+
"-"+date.getUTCDate()+
"T"+date.getUTCHours()+
":"+date.getUTCMinutes()+
":"+date.getUTCSeconds()+"."+
date.getUTCMilliseconds();
};

So far so good. How about converting SQLite dates to JavaScript Date object. This function does that trick.
dateSqlFragment = function(column, alias) {
return " ((strftime('%s', "+column+") -
strftime('%S', "+column+"))*1000 +
strftime('%f', "+column+")*1000) "+
(alias "");
};

var rs = db.execute(
“select id, “+dateSqlFragment(“modified_on”) +
” from mytable”);

Date date = new Date(rs.field(1));

Smart Copy Plugin for IntelliJ

IntelliJ always seems to do The Right Thing. It comes with code-aware features such as Smart Line Joins, and Smart Line Split. However, there is not Smart Copy to complement Smart Paste. Besides symmetry, there are good reasons to have Smart Copy. Supppose, you have a SQL query in Java Code and you want to run it in SQLPlus to see an execution plan. Another example could be embedded XML as compile time constant. When you copy a query embedded in a Java class, perhaps you are interested in its value without without the quotes and plus characters clinging to the copied text.
If you are a surprised user, you are not alone. Smart Copy is a plugin to copy the literal values. Smart Copy action maps to Ctrl+Alt+Shift+C by default. It copies the value of compile time constants and String literals to the system clipboard. It also copies the selected literals. Entire expression is copied if noting is selected. If it cannot do smart copy, it will fallback to regular copy action. Certainly a candidate for your Ctrl+C mapping.

Blog at WordPress.com.