Merge with elegance

It’s perhaps the last time I asked this interview question last week. Write a function to return a sorted array containing all elements from two given int arrays. The input arrays are sorted but omission is intentional. Bonus points for:

  1. asking if input array is already sorted?
  2. identifying that it’s the merge step of merge-sort algorithm
  3. or naming the function merge

I have seen correct, clever, dumb, outrageous, uncomfortable and painful responses. For those who provide good answers, I ask them to identify Queue operations and think of an elegant solution instead of the fastest solution. Here is a stunningly elegant implementation of merge operation which treats input and output arrays as Queues.


#ignore mutability of a, b
#first is peek operation and shift is dequeue operation 
def merge(a, b)
  c = []
  c << (a.first < b.first ? a.shift : b.shift) until 
      a.empty? || b.empty? 
  c += a
  c += b
end

Java solution in 7 lines of code:

public static int[] merge(int[] a, int[] b) {
  Q cq = new Q(new int[a.length + b.length]);
  Q aq = new Q(a);
  Q bq = new Q(b);

  while(aq.notEmpty() && bq.notEmpty()) 
      cq.enqueue(aq.peek() < bq.peek() ? aq.dequeue() : bq.dequeue());
  while(aq.notEmpty()) cq.enqueue(aq.dequeue());
  while(bq.notEmpty()) cq.enqueue(bq.dequeue());
  
  return cq.elms;
}

//You will need a helper Queue class called Q to go with it. 
class Q {
    int current;
    int[] elms;

    Q(int[] elms) {
        this.elms = elms;
    }

    boolean notEmpty() {
        return current = elms.length;
    }

    int peek() {
        return elms[current];
    }

    int dequeue() {
        return elms[current++];
    }

    void enqueue(int elm) {
        elms[current++] = elm;
    }
}

Object Notations beyond JavaScript

JavaScript Object Notation is a data interchange format based on a subset of JavaScript. The beauty of JSON is in identifying a subset of language constructs which are safe for data interchange. For instance the following snippet represents a valid JavaScript but it’s unsafe for data interchange and is invalid JSON.


{"body" : "Chunky " + "Bacon"}

On the the other hand, the example below is using String literals is safe for data interchange and is valid JSON.


{"body" : "Chunky Bacon"}

This simple idea goes a long way. Instead of defining our own format for data interchange, we are using subset of a language for exchanging objects by computer programs and since it’s represented in JavaScript, it’s easy for programmers to understand. Once you think about safety, it’s intuitive to see the subset of language constructs useful for data interchange.

The idea behind JSON employing JavaScript is analogous to REST employing HTTP. This makes JSON natural format for RESTful services and typically RESTful services provide data in JSON and XML formats. However, often working with Ruby and PHP, I find neither JSON nor XML, particularly enjoyable for hacking services. I would much prefer REST services data in the native format of language I am working with. From service provider point of view, it’s worthwhile to target other dynamic languages like Ruby, PHP, Python, etc. This leads to the generalization of Object Notations which involves identifying a safe set of language constructs and using them for data interchange. For instance the Ruby and PHP Object Notations for above snippet can be represented as:

Ruby Object Notation:

{:body => "Chunky Bacon"}

PHP Object Notation:

array("body" => "Chunky Bacon")

Service provider who provide formats like XML and JSON,

should also consider other object notations like:

Object notation with padding is the generalization of JSONP where the Object Notation is passed as argument to a specified function which allows programming REST easier than just returning them as values. For RESTful services, JSON and other object notations are better fit than XML. For an example in action, check out http://github.com/rubyorchard/geoip-rest for GeoIP City and Country REST APIs which offer object notation in Ruby, PHP, JSON and XML formats.

Near-Far Cache Architecture

Near-far cache architecture is a robust architecture for deploying cache tier for web applications with following features.

  1. Improve Performance
  2. Protect data tier and do it reliably
  3. Cache consistency

Deploying memcached on few nodes and keeping round trip time (RTT) from web/app servers to memcached servers Θ(sub millisecond) can easily improve the performance of web apps. So, without much work, we can easily improve the performance and meet our first goal. These nodes make the Far Cache.

Will far cache reliably protect data tier? Perhaps not. A digged, slashdotted, techcrunched blog post may create hot-spots in certain memcached nodes. Memcached response slowdown may cause requests to backlog and some memcached nodes are likely to fall and network switches may saturate. Hot memcached node failures may expose data tier and there is never one cockroach.

To solve hot-spot problem, we introduce near caches. A near cache is a memcached node running on the web server or app server itself. We also cache a key in near cache and if we get a near cache miss, we fetch it from far cache and put into near cache. It prevents hot-spots from forming in the far cache. But wait, we just introduced another problem. The moment we start putting data into near cache, we run into cache consistency issues and user may observe the inconsistency. With near cache, inconsistency can’t be eliminated without sacrificing scale out, but we can provide user observed consistency. This can be done by near caching only for few seconds — it works just the way alternating current works. If near cache timeout is smaller than user page round trip time plus user thinking time, then a user will not observe inconsistency introduced by near cache. 3 seconds near cache timeout is a sweet spot for us and guarantees that far cache will get 20 hits per minutes per cache key per web or app server and only user visiting next link within 3 seconds may see inconsistent data in some cases when far cache is being refreshed. We may not have impressive cache hit ratio for near cache but it protects far cache and network and provides soft user observed consistency. With some domain specific tweaking of cache keys and dynamic near cache timeout, we get 85% near cache hits.

What is the timeout for far cache? Obviously, it should be grater than near cache timeout and if the data tier can’t handle troff traffic, then it must be greater than wavelength of the traffic cycle which is typically 24 hours. If you don’t want to charge users for far cache miss as much as possible, then far cache data for a long time (a week or so) and have an application level timeout by wrapping cached values.


class CacheValue {
Date last_updated;
Object value;
}

If the far cache value expires according to application level timeout, then we still serve the data from far cache but issue a background update. Keep application level timeout Θ(minutes) and Far cache timeout Θ(weeks).

New Relic RPM to monitor memcached

New Relic Infrastructure Agent now can be used to monitor memcached farm. It runs stats on configured memcached nodes periodically and ships data to New Relic RPM. The stats are reported  for each node and also aggregated stats are reported for all nodes. Check out the source at github.

List Accumulators

Suppose we want to separate  a container type object into its component parts. Accumulators provide an elegant solution to this problem. We create a list of accumulators and traverse through the original list and collect each element into the appropriate component. In Ruby, we can use inject method to implement accumulators. Suppose we want to split a list of integers into two lists of even and odd numbers. We create two empty accumulator lists for even and odd numbers.


(5..30).inject([[], []]) do |evenOdds, n|
evenOdds[n%2].push(n)
evenOdds
end

=> [[6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30], [5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29]]

Accumulators fit naturally into functional programming language. Here is an Erlang code to do the same.


module(acc).
-export([even_odds/1]).
-import(lists, [reverse/1]).
even_odds(List) ->
even_odds(List, [], []).
even_odds([H|T], Evens, Odds) ->
case (H rem 2) of
0 -> even_odds(T, [H|Evens], Odds);
1 -> even_odds(T, Evens, [H|Odds])
end;
even_odds([],Evens, Odds) ->
[lists:reverse(Evens), lists:reverse(Odds)].


acc:even_odds(lists:seq(5, 30)).

[[6,8,10,12,14,16,18,20,22,24,26,28,30],
[5,7,9,11,13,15,17,19,21,23,25,27,29]]

Bulletproof debugging method to debug Java remote debugging

JVM remote debugging can be sometime tricky and elusive — especially when the process is running as a service. How can you separate JVM issues from other factors such as process running as service as a different user, firewall blocking debug port. One way to separate these is to know a bulletproof configuration for JVM remote debugging. Without further ado, here it is:

-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005

Note that this only applies to Java 5 and above. If you are still in stone ages, hopefully you are not reading this blog either. This is great but how can I test it quickly? I still can’t connect. My server runs headless Linux environment, and I don’t have access to my pretty IDE. One method to use telnet.

telnet silly.example.com 5005

It works but there is something better. Command line jdb and its little known use to connect to remote process. You can also use it do debugging if you like.

/usr/java/current/bin/jdb -connect com.sun.jdi.SocketAttach:hostname=silly.example.com,port=5005
Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
Initializing jdb …
> exit

Oracle Hierarchical Query in Hibernate

When you need them, there is no replacement of Oracle hierarchical queries. They scale linearly with each level while a join would scale exponentially. At the same time, we would like Hibernate to do the hard work of mapping result set to entities. Here is an example of Hibernate Native Query which would fetch the organization chart from famous scott.emp table.

Session s = (Session) em.getDelegate();
String orgChartQuery = "select {emp.*} from EMP {emp} \n" +
"start with emp.mgr is null \n" +
"connect by prior emp.empno = emp.mgr\n" +
"order siblings by ename";
@SuppressWarnings("unchecked")
List list = s.createSQLQuery(orgChartQuery)
.addEntity("emp", Employee.class)
.list();

Now suppose we want to render this in a tree control with breadcurms. We can use the following query.


Session s = (Session) em.getDelegate();
String orgChartQuery = "select sys_connect_by_path(emp.empno, '/') breadcrum,\n"+
"level, {emp.*} from EMP {emp} \n" +
"start with emp.mgr is null \n" +
"connect by prior emp.empno = emp.mgr\n" +
"order siblings by ename";
@SuppressWarnings("unchecked")
List list = s.createSQLQuery(orgChartQuery)
.addScalar("breadcrum", Hibernate.STRING)
.addScalar("level", Hibernate.INTEGER)
.addEntity("emp", Employee.class)
.list();
for (Object[] objects : list) {
String breadcrum = (String) objects[0];
Integer level = (Integer) objects[1];
Employee emp = (Employee) objects[2];
System.out.println(breadcrum +" "+ level + " "+emp.getName()
+ " "+emp.getDepartment().getName());
}

Breadcrums willl give you path to the object from the root and level gives the 1 based level in the tree.

Create a free website or blog at WordPress.com.
The Esquire Theme.

Follow

Get every new post delivered to your Inbox.