Chapter 3

第3章

The Spark framework is developed in Scala, and therefore developing Spark applications in Scala is natural. While Spark provides APIs for Python, Java, and R, Scala remains the most native and concise language for Spark development.

Spark 框架是用 Scala 开发的,因此使用 Scala 开发 Spark 应用程序是很自然的。虽然 Spark 提供了 Python、Java 和 R 的 API,但 Scala 仍然是 Spark 开发中最原生且简洁的语言。

Data Types in Scala

Scala 中的数据类型

Sr.No 序号 Data Type 数据类型 Description 描述
1 Byte 8 bit signed value. Range from -128 to 127 8位带符号值。范围从 -128 到 127
2 Short 16 bit signed value. Range -32768 to 32767 16位带符号值。范围 -32768 到 32767
3 Int 32 bit signed value. Range -2147483648 to 2147483647 32位带符号值。范围 -2147483648 到 2147483647
4 Long 64 bit signed value. -9223372036854775808 to 9223372036854775807 64位带符号值。-9223372036854775808 到 9223372036854775807
5 Float 32-bit floating-point number for decimal values. (e.g., val f: Float = 10.5f) 32位浮点数,用于小数值。(例如:val f: Float = 10.5f
6 Double 64 bit IEEE 754 double-precision float 64位 IEEE 754 双精度浮点数
7 Char 16 bit unsigned Unicode character. Range from U+0000 to U+FFFF (e.g., val c: Char = 'A') 16位无符号 Unicode 字符。范围从 U+0000 到 U+FFFF(例如:val c: Char = 'A'
8 String A sequence of Chars (e.g., val str: String = "Hello, Scala!") 字符序列(例如:val str: String = "Hello, Scala!"
9 Boolean Either the literal true or the literal false 字面量 true 或字面量 false
10 Unit Corresponds to no value. Represents the absence of a value, similar to void in Java. 对应于无值。表示没有值,类似于 Java 中的 void。
11 Null null or empty reference (e.g., val name: String = null) null 或空引用(例如:val name: String = null
12 Nothing The subtype of every other type; includes no values 所有其他类型的子类型;不包含任何值
13 Any The supertype of any type; any object is of type Any. The top-most type in Scala’s hierarchy. All types are subtypes of Any. 任何类型的超类型;任何对象都是 Any 类型。Scala 层次结构中的顶层类型。所有类型都是 Any 的子类型。
14 AnyRef The supertype of any reference type. AnyRef is the base class for all reference types (equivalent to Object in Java). 任何引用类型的超类型。AnyRef 是所有引用类型的基类(相当于 Java 中的 Object)。

Variables

变量

Scala has a different syntax for declaring variables.

Scala 声明变量的语法有所不同。

Mutable Variables (var)

可变变量 (var)

var: - It is a variable that can change value and this is called mutable variable.

var:- 它是一个可以改变值的变量,这被称为可变变量。

Syntax: var VariableName: DataType = [Initial Value]

语法: var 变量名: 数据类型 = [初始值]

Examples:

示例:

1
2
var myVar: String = "niit"
var myVar = "HNU"

Notes: 注意:

  • var for declaring variable
  • var 用于声明变量
  • myVar is a variable name
  • myVar 是变量名
  • String is a datatype
  • String 是数据类型

Immutable Variables (val)

不可变变量 (val)

val, This means that it is a variable that cannot be changed and this is called immutable variable.

val,这意味着它是一个不能被改变的变量,这被称为不可变变量。

Syntax: val VariableName: DataType = [Initial Value]

语法: val 变量名: 数据类型 = [初始值]

Examples:

示例:

1
2
val myval: String = "Foo"
val myVal = "Hello, Scala!"

Multiple Assignments

多重赋值

Scala supports multiple assignments. If a code block or method returns a Tuple (Tuple - Holds collection of Objects of different types), the Tuple can be assigned to a val variable.

Scala 支持多重赋值。如果代码块或方法返回一个元组(Tuple - 包含不同类型对象的集合),则可以将该元组赋值给一个 val 变量。

1
val (myVar1, myVar2) = (40, "Foo")

打印数据类型

1
println(myVar1.getClass)

Classes and Objects

类和对象

Create Class

创建类

A class can be defined as a template/blueprint that describes the behaviors/states that are related to the class.

类可以被定义为一个模板/蓝图,描述了与该类相关的行为/状态。

1
2
3
4
5
6
7
class Person {
// Class body
}

class Point(xc: Int, yc: Int) {
// Class body
}

Example Class:

类示例:

1
2
3
4
5
class Person(val name: String = "zhangsan", val age: Int = 18) {
def sayName() = {
"my name is " + name
}
}

Create Method

创建方法

1
2
3
def greet(): String = {
s"Hello, my name is $name and I am $age years old."
}

Object with Methods:

带有方法的对象:

1
2
3
4
5
6
7
object addNumber {
def add(a: Int, b: Int): Int = a + b

def main(args: Array[String]): Unit = {
println(add(3, 5))
}
}

Create Object

创建对象

Objects have states and behaviors. An object is an instance of a class.

对象具有状态和行为。对象是类的实例。

1
2
3
4
5
6
7
object Lesson {
def main(args: Array[String]): Unit = {
val person = new Person()
println(person.age)
println(person.sayName())
}
}

Object with Parameters (apply method)

带参数的对象(apply 方法)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
object Lesson_ObjectWithParam {
def apply(s: String) = {
println("name is " + s)
}

def apply(s: String, age: Int) = {
println("name is " + s + ", age = " + age)
}

def main(args: Array[String]): Unit = {
Lesson_ObjectWithParam("zhangsang")
Lesson_ObjectWithParam("lisi", 18)
}
}

Main Method Decoding

Main 方法解读

Breakdown of the main method signature: main 方法签名的详细解析:

1
def main(args: Array[String]): Unit =
  • def = used to define function in scala
  • def = 在 Scala 中用于定义函数
  • main = name of function
  • main = 函数名称
  • args = This defines a parameter args for the main function.
  • args = 这为 main 函数定义了一个参数 args。
  • Array[String] = means that args is an array of String elements. These strings are typically command-line arguments passed to the program when it’s run.
  • Array[String] = 意味着 args 是一个 String 元素的数组。这些字符串通常是在程序运行时传递给程序的命令行参数。
  • Unit = return type of function. Similar to void in java.
  • Unit = 函数的返回类型。类似于 Java 中的 void。

Control Structures

控制结构

If Statement

If 语句

‘if’ statement consists of a Boolean expression followed by one or more statements.

‘if’ 语句由一个布尔表达式后跟一个或多个语句组成。

1
2
3
4
5
6
7
8
9
10
11
12
13
if (Boolean expression) {
// Statements will execute if the Boolean expression is true
}

if (Boolean expression 1) {
// Executes when the Boolean expression 1 is true
} else if (Boolean expression 2) {
// Executes when the Boolean expression 2 is true
} else if (Boolean expression 3) {
// Executes when the Boolean expression 3 is true
} else {
// Executes when the none of the above condition is true.
}

Program Example:

程序示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
object DrivingLicence {
def main(args: Array[String]): Unit = {
var age = 21

if (age < 18) {
println("Not Eligible")
} else if (18 <= age && age <= 20) {
println("Learning Licence Eligible")
} else {
println("Eligible")
}
}
}

The use of ‘to’ and ‘until’

‘to’ 和 ‘until’ 的使用

1
2
3
4
5
6
println(1.to(10))	   // print 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
println(1 to 10) // print 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
println(1 to (10)) // is equivalent to printing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
println(1 to (10, 2)) // step size is 2, print from 1, 1,3,5,7,9
println(1 until 10) // does not include the last number. Print 1,2,3,4,5,6,7,8,9
println(1 until (10, 3)) // step size is 3, print from 1, print 1,4,7

Loops

循环

While Loop

While 循环

Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body.

当给定条件为真时,重复执行一条语句或一组语句。它在执行循环体之前测试条件。

1
2
3
4
5
6
7
8
9
object Loop {
def main(args: Array[String]): Unit = {
var index = 0
while (index < 100) {
println("number count:", index)
index += 1
}
}
}

Example: Print Table

示例:打印乘法表

1
2
3
4
5
6
7
8
9
10
11
12
package org.example6

object LoopTry {
def main(args: Array[String]): Unit = {
val number = 10
var index = 1
while (index < 11) {
println(number + " * " + index + " = " + (number * index))
index += 1
}
}
}

Do-while Loop

Do-while 循环

Like a while statement, except that it tests the condition at the end of the loop body.

类似于 while 语句,除了它是在循环体结束时测试条件。

1
2
3
4
5
6
7
8
9
object loop {
def main(args: Array[String]): Unit = {
var index = 0
do {
index += 1
println("loop count :", index)
} while (index < 100);
}
}

Example: Table with Do-while

示例:使用 Do-while 的乘法表

1
2
3
4
5
6
7
8
9
10
11
12
package org.example6

object LoopTry {
def main(args: Array[String]): Unit = {
val number = 10
var index = 0
do {
index += 1
println(number * index)
} while (index < 11)
}
}

For Loop

For 循环

Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable.

多次执行一系列语句,并简化管理循环变量的代码。

1
2
3
4
5
6
7
object Loop {
def main(args: Array[String]): Unit = {
for (i <- 1 to 10) {
println(i)
}
}
}

Nested Loop Example (Multiplication Table):

嵌套循环示例(乘法表):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
object sdsd {
def main(args: Array[String]): Unit = {
var count = 0

// Example: Print 999 multiplication table
for (i <- 1 until 10; j <- 1 until 10) {
if (i >= j) {
print(s"$i * $j = ${i * j} ")
}
if (i == j) {
println()
}
}
}
}

Functional Programming

函数式编程

Functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing state or mutable data. It emphasizes the use of pure functions, immutability, and declarative programming styles, where the focus is on what to do, rather than how to do it (as in imperative programming).

函数式编程是一种编程范式,它将计算视为数学函数的求值,并避免改变状态或可变数据。它强调使用纯函数、不可变性和声明式编程风格,重点在于“做什么”,而不是“怎么做”(如在指令式编程中那样)。

Imperative (Normal) Programming

指令式(普通)编程

This is the traditional way of programming where you write step-by-step instructions that the computer must follow to achieve a result. It often involves mutating (changing) variables and using loops or control structures like if, for, and while.

这是传统的编程方式,你需要编写计算机必须遵循的逐步指令以获得结果。它通常涉及改变(突变)变量并使用循环或控制结构,如 if、for 和 while。

1
2
3
4
5
6
7
8
9
10
11
12
object normalPro {
def main(args: Array[String]): Unit = {
// Imperative approach
var total = 0 // A mutable variable that we will change
val numbers = List(1, 2, 3, 4)

for (n <- numbers) {
total = total + n // Mutating total
}
println(total) // Output: 10
}
}

You are changing (mutating) the variable total in each iteration. (total=0 to total != 10)

你在每次迭代中改变(突变)变量 total。(total=0total != 10

This is how typical programming works: you give a set of instructions, and the state of the program changes over time.

这就是典型编程的工作方式:你给出一组指令,程序的状态随着时间而改变。

Functional Programming (FP)

函数式编程 (FP)

In FP, the focus is on expressing what to do, not on how to do it step-by-step. Instead of changing variables, you create new values without modifying old ones. Pure functions and immutability are key ideas. You avoid changing state (e.g., not modifying variables) and avoid side effects (e.g., not printing or changing external values inside functions).

在 FP 中,重点是表达做什么,而不是逐步说明怎么做。你创建新值而不是修改旧值,不改变变量。纯函数和不可变性是核心思想。你避免改变状态(例如,不修改变量)并避免副作用(例如,不在函数内部打印或改变外部值)。

Key Aspects of Functional Programming:

函数式编程的关键方面:

  1. First-Class Functions: Functions are treated as first-class citizens, meaning they can be passed as arguments, returned from other functions, or assigned to variables.

    • 一等函数: 函数被视为一等公民,意味着它们可以作为参数传递,从其他函数返回,或赋值给变量。
    1
    2
    val add = (x: Int, y: Int) => x + y
    println(add(3, 5)) // 8
  2. Immutability: Functional programming emphasizes immutability, meaning values are not changed but rather new values are created when needed.

    • 不可变性: 函数式编程强调不可变性,意味着值不会被改变,而是在需要时创建新值。
    1
    2
    val x = 10
    val y = x + 5 // creates a new value, x is not changed
  3. Declarative Style: Functional programming focuses on “what to do” rather than “how to do it.” This contrasts with imperative programming, which involves explicit commands to control the state of the program.

    • 声明式风格: 函数式编程专注于“做什么”而不是“怎么做”。这与包含显式命令以控制程序状态的指令式编程形成对比。
    1
    2
    3
    val numbers = List(1, 2, 3, 4)
    val doubled = numbers.map(_ * 2) // What to do: double every number
    println(doubled)
  4. Pure Functions: Functions are pure when their output depends only on their input parameters and they have no side effects (such as changing state or modifying data outside the function).

    • 纯函数: 当函数的输出仅取决于其输入参数并且没有副作用(如改变状态或修改函数外部的数据)时,函数是纯的。

Functional Approach Example:

函数式方法示例:

1
2
3
4
5
6
7
8
object FunctionalPrograming {
def main(args: Array[String]): Unit = {
// Functional approach
val numbers = List(1, 2, 3, 4)
val total = numbers.foldLeft(0)(_ + _) // Uses a function to compute the total
println(total) // Output: 10
}
}

No variable is being mutated. Instead of changing total with every iteration, we use a function called foldLeft to reduce the list to a single value (the sum).

没有变量被突变。我们不是在每次迭代中改变 total,而是使用一个名为 foldLeft 的函数将列表规约为单个值(总和)。

The function foldLeft(0)(_ + _) says: “start with 0, then apply the function (_ + _ means add the numbers) to all elements in the list.”

函数 foldLeft(0)(_ + _) 的意思是:“从 0 开始,然后将函数(_ + _ 意味着相加数字)应用于列表中的所有元素。”

Creating program to add 2 numbers:

创建程序以相加 2 个数字:

1
2
3
4
5
6
7
8
9
object FunctionalProgramming1 {
// creating program to add 2 number.
// what we want to do (add values)
def main(args: Array[String]): Unit = {
// we can assign function to variable and pass parameters.
val add = (a: Int, b: Int) => a + b
println(add(3, 5))
}
}

High-order Functions & Nested Functions

高阶函数与嵌套函数

High-order Functions

高阶函数

Scala allows the definition of higher-order functions. These are functions that take other functions as parameters, or whose result is a function.

Scala 允许定义高阶函数。这些是接受其他函数作为参数,或者其结果是一个函数的函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
package FunctionalPrograming

object FunctionalProg {
def main(args: Array[String]): Unit = {
// Passing the function intToString and an integer value to the apply function
println(apply(intToString, 10)) // Output: "The number is 10"
}

// Higher-order function: takes a function (Int => String) and an Int
def apply(f: Int => String, v: Int): String = f(v)

// A simple function that converts an Int to a String
def intToString(n: Int): String = s"The number is $n"
}

Nested Function

嵌套函数

A nested function is a function that is defined inside another function. In Scala, nested functions are allowed, and they can be useful to encapsulate logic that is only relevant inside the outer function.

嵌套函数是定义在另一个函数内部的函数。在 Scala 中,允许使用嵌套函数,它们对于封装仅在外部函数内部相关的逻辑非常有用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package FunctionalPrograming

object NestedFunctionExample {
def main(args: Array[String]): Unit = {
println(addAndMultiply(2, 3)) // Output: 15
}

// Outer function
def addAndMultiply(a: Int, b: Int): Int = {
// Nested function
def multiply(x: Int, y: Int): Int = {
x + y
}
val sum = a + b
multiply(sum, 3) // Calls nested function
}
}

Collections

集合

Scala’s collections library provides many types of collections, such as lists, arrays, sets, maps, and more. These collections support a variety of operations, such as filtering, mapping, reducing, and folding, which makes it easier to work with data in a functional programming style.

Scala 的集合库提供了多种类型的集合,如列表、数组、集合、映射等。这些集合支持各种操作,如过滤、映射、规约和折叠,这使得以函数式编程风格处理数据变得更加容易。

a Scala collection is a way to group multiple values together in one place, so you can work with them more easily. Think of a collection like a container that holds several items, such as numbers, words, or objects. Scala collections help you store, manage, and process data in different ways.

Scala 集合是一种将多个值组合在一起的方式,以便你可以更轻松地处理它们。将集合想象成一个容器,其中包含多个项目,如数字、单词或对象。Scala 集合帮助你以不同的方式存储、管理和处理数据。

Categories of Scala Collections:

Scala 集合的分类:

  1. Immutable Collections: These collections cannot be changed after they are created. Any operation that modifies an immutable collection returns a new collection with the modification applied.
  2. 不可变集合: 这些集合在创建后不能被更改。任何修改不可变集合的操作都会返回一个应用了修改的新集合。
    • Located in scala.collection.immutable.
    • 位于 scala.collection.immutable
    • Example: List, Set, Map, Vector.
    • 示例:List, Set, Map, Vector。
  3. Mutable Collections: These collections can be updated in place. Operations that modify mutable collections do not return a new collection but modify the existing one.
  4. 可变集合: 这些集合可以就地更新。修改可变集合的操作不会返回新集合,而是修改现有的集合。
    • Located in scala.collection.mutable.
    • 位于 scala.collection.mutable
    • Example: ArrayBuffer, ListBuffer, HashMap, HashSet.
    • 示例:ArrayBuffer, ListBuffer, HashMap, HashSet。
  5. Parallel Collections: These collections allow parallel operations to be performed on elements, which can result in faster processing for large data sets.
  6. 并行集合: 这些集合允许对元素执行并行操作,这可以在处理大型数据集时提高速度。
    • Located in scala.collection.parallel.
    • 位于 scala.collection.parallel

Array

数组

Scala provides a data structure, the array, which stores a fixed-size sequential collection of elements of the same type. An array is used to store a collection of data.

Scala 提供了一种数据结构——数组,它存储相同类型的固定大小的顺序元素集合。数组用于存储数据集合。

To use an array in a program, you must declare a variable to reference the array and you must specify the type of array the variable can reference.

要在程序中使用数组,你必须声明一个变量来引用数组,并且必须指定该变量可以引用的数组类型。

You can update the elements of an Array after it has been created. The size of the array is fixed, but the values at specific indices can be changed.

你可以在数组创建后更新其元素。数组的大小是固定的,但特定索引处的值可以更改。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package org.example8

object ArrayExample {
def main(args: Array[String]) {
var myList = Array(1.9, 2.9, 3.4, 3.5)

// Print the first element (index 0)
println(myList(0)) // Output: 1.9

// Print the second element (index 1)
println(myList(1)) // Output: 2.9

for (n <- myList) {
println(n)
}
}
}

Array Concatenation:

数组连接:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
package Collection

object CollectionArray {
def main(args: Array[String]): Unit = {
var myList1 = Array(1.9, 2.9, 3.4, 3.5)
var myList2 = Array(8.9, 7.9, 0.4, 1.5)
var myList3 = Array.concat(myList1, myList2)

// Print all the array elements
for (x <- myList3) {
println(x)
}
}
}

1. List

1. 列表

A List in Scala is a linked list, and it is one of the most commonly used collections. it is immutable.

Scala 中的 List 是一个链表,它是最常用的集合之一。它是不可变的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package Collection

object CollectionList {
def main(args: Array[String]): Unit = {
val list = List(1, 2, 3, "NI")
println(list)
println(list.getClass)
println(list.head)
println(list.tail)

for (n <- list) {
println("List items :" + n)
}

println("Functional ")
list.foreach(x => println(x))
}
}

2. Set

2. 集合

A Set is a collection that contains no duplicate elements. It is immutable and no fix order.

Set 是一个不包含重复元素的集合。它是不可变的且没有固定顺序。

1
2
3
4
5
6
7
8
9
10
11
12
13
import scala.collection.mutable

object CollectionSet {
def main(args: Array[String]): Unit = {
val set = Set(1, 2, 3, 1)
println(set)
println("DataType:" + set.getClass)

val data = mutable.Set(1, 2, 4, 5)
data.add(6)
println(data)
}
}

3. Map

3. 映射

A Map is a collection of key-value pairs. Keys are unique in a map.

Map 是键值对的集合。Map 中的键是唯一的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package Collection
import scala.collection.mutable

object CollectionMap {
def main(args: Array[String]): Unit = {
// Immutable Map
val map = Map(1 -> "Scala", 2 -> "Java")
println(map(1))

// Mutable Map
val mutableMap = mutable.Map(1 -> "Scala", 2 -> "Java")
mutableMap(1) = "Python"
println(mutableMap)
}
}

4. Tuples

4. 元组

A Tuple can hold elements of different types. Unlike lists, tuples are immutable and have a fixed size. 22 elements max.

元组可以包含不同类型的元素。与列表不同,元组是不可变的并且具有固定大小。最多 22 个元素。

1
2
3
4
5
6
7
8
9
package Collection

object CollectionTuple {
def main(args: Array[String]): Unit = {
val tuple = (1, "Scala", true)
println(tuple._1)
println(tuple._3)
}
}

Traits

特质

Traits are the basic unit of code reuse in Scala. Traits encapsulate the definitions of methods and fields and can be reused by mixing them into classes. Scala traits are like Java interfaces with concrete methods, but they do a lot more. A Trait is defined in a similar way to a class but USES the keyword as a trait;

Traits 是 Scala 中代码重用的基本单元。Traits 封装了方法和字段的定义,并可以通过混合到类中进行重用。Scala 的 traits 就像带有具体方法的 Java 接口,但它们的功能要强大得多。Trait 的定义方式与类相似,但使用 trait 关键字;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package org.Examplearray

trait Animal {
def sound(): String // Abstract method
}

class Dog extends Animal {
def sound(): String = "Woof" // Implementing the method
}

class Cat extends Animal {
def sound(): String = "Meow"
}

object Master {
def main(args: Array[String]): Unit = {
val dog = new Dog()
val cat = new Cat()
println(dog.sound()) // Output: Woof
println(cat.sound()) // Output: Meow
}
}

Pattern Matching

模式匹配

Pattern matching is the second most widely used feature of Scala, after function values and closures. Scala provides great support for pattern matching, in processing the messages.

模式匹配是 Scala 中继函数值和闭包之后第二广泛使用的特性。Scala 在处理消息方面为模式匹配提供了很好的支持。

It’s very similar to a switch statement in languages like Java or C++, but much more flexible and powerful.

它非常类似于 Java 或 C++ 等语言中的 switch 语句,但更加灵活和强大。

1
2
3
4
5
6
7
8
9
10
11
12
13
package org.example7

object PatternMatching {
def main(args: Array[String]) {
println(matchTest(2))
}

def matchTest(x: Int): String = x match {
case 1 => "one"
case 2 => "two"
case _ => "many"
}
}

Activity 1: Make a program to calculate factorial.

活动 1: 编写一个程序来计算阶乘。