Merge with elegance

It’s perhaps the last time I asked this interview question last week. Write a function to return a sorted array containing all elements from two given int arrays. The input arrays are sorted but omission is intentional. Bonus points for:

  1. asking if input array is already sorted?
  2. identifying that it’s the merge step of merge-sort algorithm
  3. or naming the function merge

I have seen correct, clever, dumb, outrageous, uncomfortable and painful responses. For those who provide good answers, I ask them to identify Queue operations and think of an elegant solution instead of the fastest solution. Here is a stunningly elegant implementation of merge operation which treats input and output arrays as Queues.


#ignore mutability of a, b
#first is peek operation and shift is dequeue operation 
def merge(a, b)
  c = []
  c << (a.first < b.first ? a.shift : b.shift) until 
      a.empty? || b.empty? 
  c += a
  c += b
end

Java solution in 7 lines of code:

public static int[] merge(int[] a, int[] b) {
  Q cq = new Q(new int[a.length + b.length]);
  Q aq = new Q(a);
  Q bq = new Q(b);

  while(aq.notEmpty() && bq.notEmpty()) 
      cq.enqueue(aq.peek() < bq.peek() ? aq.dequeue() : bq.dequeue());
  while(aq.notEmpty()) cq.enqueue(aq.dequeue());
  while(bq.notEmpty()) cq.enqueue(bq.dequeue());
  
  return cq.elms;
}

//You will need a helper Queue class called Q to go with it. 
class Q {
    int current;
    int[] elms;

    Q(int[] elms) {
        this.elms = elms;
    }

    boolean notEmpty() {
        return current = elms.length;
    }

    int peek() {
        return elms[current];
    }

    int dequeue() {
        return elms[current++];
    }

    void enqueue(int elm) {
        elms[current++] = elm;
    }
}

Object Notations beyond JavaScript

JavaScript Object Notation is a data interchange format based on a subset of JavaScript. The beauty of JSON is in identifying a subset of language constructs which are safe for data interchange. For instance the following snippet represents a valid JavaScript but it’s unsafe for data interchange and is invalid JSON.


{"body" : "Chunky " + "Bacon"}

On the the other hand, the example below is using String literals is safe for data interchange and is valid JSON.


{"body" : "Chunky Bacon"}

This simple idea goes a long way. Instead of defining our own format for data interchange, we are using subset of a language for exchanging objects by computer programs and since it’s represented in JavaScript, it’s easy for programmers to understand. Once you think about safety, it’s intuitive to see the subset of language constructs useful for data interchange.

The idea behind JSON employing JavaScript is analogous to REST employing HTTP. This makes JSON natural format for RESTful services and typically RESTful services provide data in JSON and XML formats. However, often working with Ruby and PHP, I find neither JSON nor XML, particularly enjoyable for hacking services. I would much prefer REST services data in the native format of language I am working with. From service provider point of view, it’s worthwhile to target other dynamic languages like Ruby, PHP, Python, etc. This leads to the generalization of Object Notations which involves identifying a safe set of language constructs and using them for data interchange. For instance the Ruby and PHP Object Notations for above snippet can be represented as:

Ruby Object Notation:

{:body => "Chunky Bacon"}

PHP Object Notation:

array("body" => "Chunky Bacon")

Service provider who provide formats like XML and JSON,

should also consider other object notations like:

Object notation with padding is the generalization of JSONP where the Object Notation is passed as argument to a specified function which allows programming REST easier than just returning them as values. For RESTful services, JSON and other object notations are better fit than XML. For an example in action, check out http://github.com/rubyorchard/geoip-rest for GeoIP City and Country REST APIs which offer object notation in Ruby, PHP, JSON and XML formats.

Near-Far Cache Architecture

Near-far cache architecture is a robust architecture for deploying cache tier for web applications with following features.

  1. Improve Performance
  2. Protect data tier and do it reliably
  3. Cache consistency

Deploying memcached on few nodes and keeping round trip time (RTT) from web/app servers to memcached servers Θ(sub millisecond) can easily improve the performance of web apps. So, without much work, we can easily improve the performance and meet our first goal. These nodes make the Far Cache.

Will far cache reliably protect data tier? Perhaps not. A digged, slashdotted, techcrunched blog post may create hot-spots in certain memcached nodes. Memcached response slowdown may cause requests to backlog and some memcached nodes are likely to fall and network switches may saturate. Hot memcached node failures may expose data tier and there is never one cockroach.

To solve hot-spot problem, we introduce near caches. A near cache is a memcached node running on the web server or app server itself. We also cache a key in near cache and if we get a near cache miss, we fetch it from far cache and put into near cache. It prevents hot-spots from forming in the far cache. But wait, we just introduced another problem. The moment we start putting data into near cache, we run into cache consistency issues and user may observe the inconsistency. With near cache, inconsistency can’t be eliminated without sacrificing scale out, but we can provide user observed consistency. This can be done by near caching only for few seconds — it works just the way alternating current works. If near cache timeout is smaller than user page round trip time plus user thinking time, then a user will not observe inconsistency introduced by near cache. 3 seconds near cache timeout is a sweet spot for us and guarantees that far cache will get 20 hits per minutes per cache key per web or app server and only user visiting next link within 3 seconds may see inconsistent data in some cases when far cache is being refreshed. We may not have impressive cache hit ratio for near cache but it protects far cache and network and provides soft user observed consistency. With some domain specific tweaking of cache keys and dynamic near cache timeout, we get 85% near cache hits.

What is the timeout for far cache? Obviously, it should be grater than near cache timeout and if the data tier can’t handle troff traffic, then it must be greater than wavelength of the traffic cycle which is typically 24 hours. If you don’t want to charge users for far cache miss as much as possible, then far cache data for a long time (a week or so) and have an application level timeout by wrapping cached values.


class CacheValue {
Date last_updated;
Object value;
}

If the far cache value expires according to application level timeout, then we still serve the data from far cache but issue a background update. Keep application level timeout Θ(minutes) and Far cache timeout Θ(weeks).

New Relic RPM to monitor memcached

New Relic Infrastructure Agent now can be used to monitor memcached farm. It runs stats on configured memcached nodes periodically and ships data to New Relic RPM. The stats are reported  for each node and also aggregated stats are reported for all nodes. Check out the source at github.

List Accumulators

Suppose we want to separate  a container type object into its component parts. Accumulators provide an elegant solution to this problem. We create a list of accumulators and traverse through the original list and collect each element into the appropriate component. In Ruby, we can use inject method to implement accumulators. Suppose we want to split a list of integers into two lists of even and odd numbers. We create two empty accumulator lists for even and odd numbers.


(5..30).inject([[], []]) do |evenOdds, n|
evenOdds[n%2].push(n)
evenOdds
end

=> [[6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30], [5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29]]

Accumulators fit naturally into functional programming language. Here is an Erlang code to do the same.


module(acc).
-export([even_odds/1]).
-import(lists, [reverse/1]).
even_odds(List) ->
even_odds(List, [], []).
even_odds([H|T], Evens, Odds) ->
case (H rem 2) of
0 -> even_odds(T, [H|Evens], Odds);
1 -> even_odds(T, Evens, [H|Odds])
end;
even_odds([],Evens, Odds) ->
[lists:reverse(Evens), lists:reverse(Odds)].


acc:even_odds(lists:seq(5, 30)).

[[6,8,10,12,14,16,18,20,22,24,26,28,30],
[5,7,9,11,13,15,17,19,21,23,25,27,29]]

Bulletproof debugging method to debug Java remote debugging

JVM remote debugging can be sometime tricky and elusive — especially when the process is running as a service. How can you separate JVM issues from other factors such as process running as service as a different user, firewall blocking debug port. One way to separate these is to know a bulletproof configuration for JVM remote debugging. Without further ado, here it is:

-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

Note that this only applies to Java 5 and above. If you are still in stone ages, hopefully you are not reading this blog either. This is great but how can I test it quickly? I still can’t connect. My server runs headless Linux environment, and I don’t have access to my pretty IDE. One method to use telnet.

telnet silly.example.com 5005

It works but there is something better. Command line jdb and its little known use to connect to remote process. You can also use it do debugging if you like.

/usr/java/current/bin/jdb -connect com.sun.jdi.SocketAttach:hostname=silly.example.com,port=5005
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb …
> exit

Oracle Hierarchical Query in Hibernate

When you need them, there is no replacement of Oracle hierarchical queries. They scale linearly with each level while a join would scale exponentially. At the same time, we would like Hibernate to do the hard work of mapping result set to entities. Here is an example of Hibernate Native Query which would fetch the organization chart from famous scott.emp table.

Session s = (Session) em.getDelegate();
String orgChartQuery = "select {emp.*} from EMP {emp} \n" +
"start with emp.mgr is null \n" +
"connect by prior emp.empno = emp.mgr\n" +
"order siblings by ename";
@SuppressWarnings("unchecked")
List list = s.createSQLQuery(orgChartQuery)
.addEntity("emp", Employee.class)
.list();

Now suppose we want to render this in a tree control with breadcurms. We can use the following query.


Session s = (Session) em.getDelegate();
String orgChartQuery = "select sys_connect_by_path(emp.empno, '/') breadcrum,\n"+
"level, {emp.*} from EMP {emp} \n" +
"start with emp.mgr is null \n" +
"connect by prior emp.empno = emp.mgr\n" +
"order siblings by ename";
@SuppressWarnings("unchecked")
List list = s.createSQLQuery(orgChartQuery)
.addScalar("breadcrum", Hibernate.STRING)
.addScalar("level", Hibernate.INTEGER)
.addEntity("emp", Employee.class)
.list();
for (Object[] objects : list) {
String breadcrum = (String) objects[0];
Integer level = (Integer) objects[1];
Employee emp = (Employee) objects[2];
System.out.println(breadcrum +" "+ level + " "+emp.getName()
+ " "+emp.getDepartment().getName());
}

Breadcrums willl give you path to the object from the root and level gives the 1 based level in the tree.

Handling Date with Google Gears

Suppose you execute the following DML statement in Google Gears:
db.execute(
"insert or replace into mytable "+
"(id, modified_on) values (?, ?)",
[null, new Date()]
);

This seemingly works but you are in for a surprise because Date is stored as Date.toString(). SQLite ducktying brain can’t infer that it’s a date datum. If you ran select query you will get string back.
Sat Jan 12 2008 02:34:58 GMT-0800 »
(Pacific Standard Time)

The downside of storing date as text that you can’t use any SQLite date time functions in your queries. Due to manifest typing, SQLite happily stores text in a date column. We would like to do better. We would like to store JavaScript Date as native SQLite Date without any loss of information. SQLite itself stores date in Julian Day representation while JavaScript cannonical date representation is number of milliseconds since 1 January 1970 00:00:00 UTC. So, we need to convert JavaScript Date to either Julian Day or text as understood by SQLite. Considering the ease of implementation, efficiency and lossless representation, YYYY-MM-DDTHH:MM:SS.FFF date format seems to fit the bill. This is ISO8601 format. However after some digging in SQLite code, it turns out that if it SQLite also works even if we don’t zero pad fields. So this simple JavaScript method with minor departure from ISO8601 format does the job well.
dateToSQLiteFormat = function(date) {
return date.getUTCFullYear()+
"-"+date.getUTCMonth() + 1+
"-"+date.getUTCDate()+
"T"+date.getUTCHours()+
":"+date.getUTCMinutes()+
":"+date.getUTCSeconds()+"."+
date.getUTCMilliseconds();
};

So far so good. How about converting SQLite dates to JavaScript Date object. This function does that trick.
dateSqlFragment = function(column, alias) {
return " ((strftime('%s', "+column+") -
strftime('%S', "+column+"))*1000 +
strftime('%f', "+column+")*1000) "+
(alias "");
};

var rs = db.execute(
“select id, “+dateSqlFragment(“modified_on”) +
” from mytable”);

Date date = new Date(rs.field(1));

Smart Copy Plugin for IntelliJ

IntelliJ always seems to do The Right Thing. It comes with code-aware features such as Smart Line Joins, and Smart Line Split. However, there is not Smart Copy to complement Smart Paste. Besides symmetry, there are good reasons to have Smart Copy. Supppose, you have a SQL query in Java Code and you want to run it in SQLPlus to see an execution plan. Another example could be embedded XML as compile time constant. When you copy a query embedded in a Java class, perhaps you are interested in its value without without the quotes and plus characters clinging to the copied text.
If you are a surprised user, you are not alone. Smart Copy is a plugin to copy the literal values. Smart Copy action maps to Ctrl+Alt+Shift+C by default. It copies the value of compile time constants and String literals to the system clipboard. It also copies the selected literals. Entire expression is copied if noting is selected. If it cannot do smart copy, it will fallback to regular copy action. Certainly a candidate for your Ctrl+C mapping.

Maven debug output

Imagine you are frustrated by one of the maven plugin and you want to turn the debug output on. Google debugging a maven plugin or maven debug output and you may still be out of luck. Here is the brute force way of doing it.
Open MAVEN_HOME/core/plexus-container-default-1.0-alpha-9.jar and edit org/codehaus/plexus/plexus-bootstrap.xml to set threshold to debug.

<component>
<role>org.codehaus.plexus.logging.LoggerManager</role>
<implementation>
org.codehaus.plexus.logging.console.ConsoleLoggerManager
</implementation>
<lifecycle-handler>basic</LIFECYCLE-HANDLER>
<configuration>
<threshold>debug</threshold>
</configuration>
</component>

Blog at WordPress.com.