Literature Review
Peer to Peer Journal Article
Knowledge Based
Introduction
Scala was born from the mind of Martin Odersky, a man who had helped introduce generics into the Java programming language. Scala was an offshoot from the Fun nel language, an attempt to combine functional programming and Petri nets. Scala was developed with the premise that you could mix together object orientation, functional programming, and a powerful type system and still keep elegant, suc cinct code. It was hoped that this blending of concepts would create something that real developers could use and that could be studied for new programming idioms. It was such a large success that industry has started adopting Scala as a viable and competitive language. Understanding Scala requires understanding this mixture of concepts. Scala attempts to blend three dichotomies of thought into one language. These are: ¡ Functional programming and object-oriented programming ¡ Expressive syntax and static typing ¡ Advanced language features and rich Java integration
Literature Review:
Chapter 1
1.1
Functional programming is programming through the definition and composition of functions. Object-oriented programming is programming through the definition and composition of objects. In Scala, functions are objects. Programs can be constructed through both the definition and composition of objects or functions. This gives Scala the ability to focus on “nouns” or “verbs” in a program, depending on what is the most prominent. Scala also blends expressive syntax with static typing. Mainstream statically typed languages tend to suffer from verbose type annotations and boilerplate syntax. Scala takes a few lessons from the ML programming language and offers static typing with a nice expressive syntax. Code written in Scala can look as expressive as dynamically typed languages, like Ruby, while retaining type safety. Finally, Scala offers a lot of advanced language features that are not available in Java. But Scala runs on the Java virtual machine (JVM) and has tight integration with the Java language. This means that developers can make direct use of existing Java libraries and integrate Scala into their Java applications while also gaining the addi tional power of Scala. This integration makes Scala a practical choice for any JVM based project. Let’s take a deeper look at the blending of paradigms in Scala
Functional programming and object-oriented programming are two different ways of looking at a problem. Functional programming puts special emphasis on the “verbs” of a program and ways to combine and manipulate them. Object-oriented programming puts special emphasis on “nouns” and attaches verbs to them. The two approaches are almost inverses of each other, with one being “top down” and the other “bottom up.” Object-oriented programming is a top-down approach to code design. It approaches software by dividing code into nouns or objects. Each object has some form of identity (self/this), behavior (methods), and state (members).
After identifying nouns and defining their behaviors, interactions between nouns are defined. The problem with implementing interactions is that the interactions need to live inside an object. Modern object-oriented designs tend to have service classes, which are a collection of methods that operate across several domain objects. Service classes, although objects, usually don’t have a notion of state or behavior independent of the objects on which they operate. A good example is a program that implements the following story: “A cat catches a bird and eats it.” An object-oriented programmer would look at this sentence and see two nouns: cat and bird. The cat has two verbs associated with it: catch and eat. The following program is a more object-oriented approach
1.2 Functional Data Structures
Introduction To do functional programming, we construct programs from collections of pure functions. Given the same arguments, a pure function always returns the same result. The function application is thus referentially transparent. By referentially transparent we mean that a name or symbol always denotes the same value in some well-defined context in the program. Such a pure function does not have side effects. It does not modify a variable or a data structure in place. It does not set throw an exception or perform input/output. It does nothing that can be seen from outside the function except return its value Thus the data structures in pure functional programs must be immutable, not subject to change as the program executes. (If mutable data structures are used, no changes to the structures must be detectable outside the function.) For example, the Scala empty list—written as Nil or List()—represents a value as immutable as the numbers 2 and 7.
Just as evaluating the expression 2 + 7 yields a new number 9, the concatenation of list c and list d yields a new list (written c ++ d) with the elements of c followed by the elements of d. It does not change the values of the original input lists c and d. Perhaps surprisingly, list concatenation does not require both lists to be copied, as we see below.
1.3 A List Algebraic Data Type To explore how to build immutable data structures in Scala, we examine a simplified, singly linked list structure implemented as an algebraic data type. This list data type is similar to the built in Scala List data type. What do we mean by algebraic data type?
1.4 Algebraic data types An algebraic data type is a type formed by combining other types, that is, it is a composite data type. The data type is created by an algebra of operations of two primary kinds:
• asum operation that constructs values to have one variant among several possible variants.
These sum types are also called tagged, disjoint union, or variant types. The combining operation is the alternation operator, which denotes the choice of one but not both between two alternatives 3
1.5 Functional Data Structures
Chapter Introduction To do functional programming, we construct programs from collections of pure functions. Given the same arguments, a pure function always returns the same result. The function application is thus referentially transparent. By referentially transparent we mean that a name or symbol always denotes the same value in some well-defined context in the program. Such a pure function does not have side effects. It does not modify a variable or a data structure in place. It does not set throw an exception or perform input/output. It does nothing that can be seen from outside the function except return its value Thus the data structures in pure functional programs must be immutable, not subject to change as the program executes. (If mutable data structures are used, no changes to the structures must be detectable outside the function.)
For example, the Scala empty list—written as Nil or List()—represents a value as immutable as the numbers 2 and 7. Just as evaluating the expression 2 + 7 yields a new number 9, the concatenation of list c and list d yields a new list (written c ++ d) with the elements of c followed by the elements of d. It does not change the values of the original input lists c and d. Perhaps surprisingly, list concatenation does not require both lists to be copied, as we see below.
A List Algebraic Data Type To explore how to build immutable data structures in Scala, we examine a simplified, singly linked list structure implemented as an algebraic data type. This list data type is similar to the builtin Scala List data type. What do we mean by algebraic data type?
Algebraic data types An algebraic data type is a type formed by combining other types, that is, it is a composite data type. The data type is created by an algebra of operations of two primary kinds:
• asum operation that constructs values to have one variant among several possible variants. These sum types are also called tagged, disjoint union, or variant types. The combining operation is the alternation operator, which denotes the choice of one but not both between two alternatives
a product operation that combines several values (i.e., fields) together to construct a single value. These are tuple and record types. The combining operation is the Cartesian product from set theory. We can combine sums and products recursively into arbitrarily large structures. An enumerated type is a sum type in which the constructors take no arguments. Each constructor corresponds to a single value
Functional Programming Functional programming has not seen wide use in the programming industry. There are several rea sons for this. Functional programming languages place a strong emphasis on minimization of side effects, and immutable states. Functional programming is an expansion of lambda calculus. The idea of functional programming is that the program executes as a series of mathematical function evaluations
[2]. From its mathematical nature, every function call that has the same arguments should produce the same return statement. The emphasis on reduction of side e ects has hurt the adoption of functional languages in the industry, many programs require some form of side e ect. Side e ects include any modi cation of a state, such as modifying a variable, printing to the console and writing to les. As well, for many programmers, it is easier to conceptualize problems by modifying states. While there are drawbacks to functional programming, there are two important advantages. First, functional code tends to be easier to read and maintain between programmers. By its nature, functional programming has few mutable variables. It is not required for a programmer to track how a variable is changing through the execution of the code. Second, with its emphasis on immutability, functional programming tends to be much more parallelizable.
Chapter 2
2.1 Actors
Scala/Akka. Scala is a modern multi-paradigm and combined with the Akka libraries is commonly used to engineer servers. Scala provides multiple computation paradigms, supporting both object-oriented and functional paradigms. It executes on Java Virtual Machines (JVMs). For reliability, Scala provides a sophisticated strong static type system that provides features such as algebraic data types, anonymous and higher-order types. Akka is an open-source toolkit and run-time used for developing reliable concurrent and distributed applications on the JVM [14]. While Akka implements multiple concurrency models, a key element is Erlang-inspired actor-based con currency. Akka Classic dynamically type-checks messages between actors. More recently Typed Akka [15] statically types messages using Scala typing. The Scala SCP implementation uses Typed Akka
The Akka Library Earlier versions of Scala had natively implemented actors as part of the Scala library and could be defined without any additional libraries. Newer versions (2.9 and above) have removed the built in Actor and now there is the Akka Library. Akka is developed and maintained by TYPESAFE [9] and when included in an application, concurrency can be achieved. Actors are defined as classes that include or extend the Actor trait. This feature enforces the definition of at least a receive function. Receive is defined as a partial function, taking another function and returning a Unit (void value). The function it expects is the behaviour that the developer needs to program into the actor. This is a essentially defined as a pattern matching sequence of actions to be taken when a message is received that matches a given pattern.
import akka.actor.Actor
class MyActor extends Actor { def receive = { case Message1 =>
//some action case Message2(x:Int) =>
// another action use
// x as in int … case MessageN =>
//Other actions } }
At the heart of the Akka Actor implementation is the Java concurrency library java.util.concurrent [10]. This library provides the (multi)threading that Akka Actors use for concurrency. Users of the library do not need to worry about scheduling, forking and/or joining. This is dealt with by the library’s interaction with the executor service and context.
The Akka Library offers options to select which executor service to use. It currently defaults to the ForkJoinPool, which is usually sufficient for most tasks. This is referred to as the Dispatcher [9][11] and is equipped with the functionality to determine the execution strategy for a given program, such as which thread to use, how many to make available in a pool for actors to run on, etc. All these options are exposed in a malleable configuration factory. Developers are able to fine tune the actor system’s behaviour which relative ease
Actors are an abstraction on a synchronous processes. They communicate to the external world by sending and receiving messages. An actor will process received messages sequentially in the order they’re received, but will handle only one mes sage at a time. This is critical, because it means that actors can maintain state with out explicit locks. Actors can also be asynchronous or synchronous. Most actors won’t block a thread when waiting for messages, although this can be done if desired. The default behavior for actors is to share threads among each other when handling messages. This means a small set of threads could support a large number of actors, given the right behavior
In fact, actors are great state machines. They accept a limited number of input messages and update their internal state. All communication is done through mes sages and each actor stands alone. But actors won’t solve all issues your system faces. You have to know how to use them. 9.1 Know when to use actors Actors aren’t parallelization factories; they process their messages in single-threaded fashion. They work best when work is conceptually split and each actor can handle a portion of the work. If the application needs to farm many similar tasks out for processing, this requires a large pool of actors to see any concurrency benefits. Actors and I/O should be interleaved carefully. Asynchronous I/O and actors are a natural pairing, as the execution models for these are similar. Using an actor to per form blocking I/O is asking for trouble. That actor can starve other actors during this processing. This can be mitigated, as we’ll discuss in section 9.4. Although many problems can be successfully modeled in actors, some will benefit more. The architecture of a system designed to use actors will also change fundamen tally. Rather than relying on classic Model-View-Controller and client-based parallel ism, an actors system parallelizes pieces of the architecture and performs all communication asynchronously. Let’s look at a canonical example of a good system design using actors. This exam ple uses several tools found in the old Message Passing Interface (MPI) specification used in supercomputing. MPI is worth a look, as it holds a lot of concepts that have naturally translated into actor-based systems
Dynamic actor topology One of the huge benefits of using actors is that the topology of your program can change drastically at runtime to handle load or data size. For example, let’s redesign the scatter-gather search tree so that it can accept new documents on the fly and add Licensed to Dynamic actor topology 229 them to the index.
The tree should be able to grow in the event that a specific node gets to be too large. To accomplish this, we can treat an actor as a state machine.
Rule 23 Just use Akka Akka is the most performant actors framework available on the JVM. It’s designed with actor best practices baked into the API.
Writing efficient, robust actors systems is simplest in the Akka framework. The entire scatter-gather tree is composed of two node types: search (leaves) and head (branches). A search node holds an index, like the previous topic nodes. It’s responsi ble for adding new documents to the index and for returning results to queries. A head node holds the number of children. It’s responsible for delegating queries to all children and setting up a gatherer to aggregate the results
After updating the index, the document is added to the list of stored documents. Finally, if the number of documents in this node has gone above the maximum desired per node, the split method is called. The split method should split this leaf node into several leaf nodes and replace itself with a branch node. Let’s defer defin ing the split method until after the parent node is defined. If the index doesn’t need to be split, the index is updated. To update the index, the document string is split into words. These words are grouped together such that the key refers to a single word in a document and the value refers to a sequence of all the same words in the document. This sequence is later used to calculate the score of a given word in the document. The current index for a word is extracted into the term list. The index for the given word is then updated to include the new document and the score for that word in the document. Let’s first define the branch node functionality before defining the split method:
Actors provide a simpler parallelization model than traditional locking and threading. A well-behaved actors system can be fault-tolerant and resistant to total system slow down. Actors provide an excellent abstraction for designing high-performance serv ers, where throughput and uptime are of the utmost importance. For these systems, designing failure zones and failure handling behaviors can help keep a system run ning even in the event of critical failures. Splitting actors into scheduling zones can ensure that input overload to any one portion of the system won’t bring the rest of the system down. Finally, when designing with actors, you should use the Akka library for production systems. The Akka library differs from the standard library in a few key areas: ¡ Clients of an actor can never obtain a direct reference to that actor. This drasti cally simplifies scaling an Akka system to multiple servers because there’s no chance an actor requires the direct reference to another. ¡ Messages are handled in the order received. If the current message handling routine can’t handle an input message, it’s dropped (or handled by the unknown message handler). This prevents out-of-memory errors due to mes sage buffers filling up. ¡ All core actors library code is designed to allow user code to handle failures without causing more. For example, Akka goes to great lengths to avoid causing out-of-memory exceptions within the core library. This allows user code, your code, to handle failures as needed. ¡ Akka provides most of the basic supervisor behaviors that can be used as build ing blocks for complex supervision strategies. ¡ Akka provides several means of persisting state “out of the box.”
So, while the Scala actors library is an excellent resource for creating actors applica tions, the Akka library provides the features and performance needed to make a pro duction application. Akka also supports common features out of the box. Actors and actor-related system design is a rich subject. This chapter lightly cov ered a few of the key aspects to actor-related design. These should be enough to cre ate a fault-tolerant high-performant actors system. Next let’s look into a topic of great interest: Java interoperability with Scala
One of the biggest dangers in the Scala standard actors library is to give actors references to each other. This can lead to accidentally calling a method defined on another actor instead of sending a message to that actor. Although that may seem innocuous to some, this behavior can break an actors system, especially if you use locking. Actors are optimized by minimizing locking to a few minor locations, such as when scheduling and working with a message buffer. Introducing more locking can easily lead to deadlocks and frustration. Another disadvantage to passing direct references to actors is transparency, where the location of an actor is tied in to another actor. This locks them in place where they are. The actors can no longer migrate to other locations, either in memory or on the network, severely limiting the system’s ability to handle failure.
Another downside to sending actors directly in the Scala standard library is that actors are untyped. This means that all the handy type system utilities you could lever age are thrown out the window when using raw actors. Specifically, the compiler’s ability to find exhausting pattern matches using sealed traits. USING SEALED TRAITS FOR MESSAGE APIS It’s a best practice in Scala to define message APIs for actors within a sealed trait hierarchy. This has the benefit of defining every message that an actor can handle and keeping them in a central location for easy lookup. With a bit of machinery, the compiler can be coerced to warn when an actor doesn’t handle its complete messaging API. The Scala standard library provides two mechanisms for enforcing type safety and decoupling references from directly using an actor: the Input Channel and Output Channel traits.