📊 MongoDB Aggregation Framework
📊 MongoDB 聚合框架
- Purpose: Enables complex data analysis and transformation similar to SQL’s GROUP BY and JOIN.
- 目的:实现复杂的数据分析和转换,类似于 SQL 的 GROUP BY 和 JOIN。
- Pipeline Approach: Data flows through multiple stages, each performing a specific operation.
- 管道方法:数据流经多个阶段,每个阶段执行特定的操作。
Common Stages:
常用阶段:
- $match: Filters documents.
- $match:过滤文档。
- $group: Groups documents by a specific key.
- $group:按特定键对文档进行分组。
- $project: Reshapes documents.
- $project:重塑文档。
- $sort: Orders results.
- $sort:对结果进行排序。
🔄 Transactions in MongoDB
🔄 MongoDB 中的事务
Transaction: A logical group of operations ensuring data integrity.
事务:确保数据完整性的逻辑操作组。
ACID Compliance: Ensures Atomicity, Consistency, Isolation, Durability.
ACID 合规性:确保原子性、一致性、隔离性、持久性。
Use Case: In e-commerce, when a customer places an order:
用例:在电子商务中,当客户下订单时:
- Deduct inventory.
- 扣减库存。
- Record order details.
- 记录订单详情。
- Update order history.
- 更新订单历史。
Rollback Mechanism: If any operation fails, the transaction is aborted.
回滚机制:如果任何操作失败,事务将被中止。
⚙️ Replication
⚙️ 复制
Replication: Keeping identical data copies across multiple servers for high availability and safety.
复制:在多个服务器上保留相同的数据副本,以实现高可用性和安全性。
- Recommended: For all production deployments.
- 建议:用于所有生产部署。
- Replica Set: A configuration (e.g.,
ecommerceReplicaSet) with multiple members to ensure data redundancy and failover support. - 副本集:一种配置(例如
ecommerceReplicaSet),包含多个成员以确保数据冗余和故障转移支持。
📈 Aggregation Pipeline Details
📈 聚合管道详情
- Concept: A sequence of processing stages where documents pass through operations.
- 概念:文档通过操作的一系列处理阶段。
- Tunable Stages: Each stage can be parameterized with operators to modify fields or perform arithmetic operations.
- 可调阶段:每个阶段都可以使用操作符进行参数化,以修改字段或执行算术运算。
Example of Aggregation Pipeline:
聚合管道示例:
- Initial Filter: Use
$matchto filter documents. - 初始过滤:使用
$match过滤文档。 - Further Processing: Apply additional filters or transformations later in the pipeline.
- 进一步处理:在管道的后续阶段应用额外的过滤器或转换。
Sample Query:
示例查询:
To find companies founded in 2004:
查找成立于 2004 年的公司:
1
2
3db.companies.aggregate([
{ $match: { founded_year: 2004 } }
])
📋 Example Output Transformation:
📋 示例输出转换:
Adding a project stage to limit output fields:
添加一个 project 阶段来限制输出字段:
1
2
3
4db.companies.aggregate([
{ $match: { founded_year: 2004 } },
{ $project: { _id: 0, name: 1, founded_year: 1 } }
])
📚 Key Takeaways:
📚 关键要点:
- Aggregation Framework: Essential for complex data processing and analytics.
- 聚合框架:对于复杂数据处理和分析至关重要。
- Transactions: Crucial for maintaining data integrity in multi-document operations.
- 事务:对于在多文档操作中维护数据完整性至关重要。
- Replication: Vital for ensuring data availability and fault tolerance in production environments.
- 复制:对于确保生产环境中的数据可用性和容错性至关重要。
🛠️ Aggregation Framework Overview
🛠️ 聚合框架概述
- Aggregation — Process of transforming data into a summary format
- 聚合 — 将数据转换为摘要格式的过程
- Pipeline — A sequence of data processing stages
- 管道 — 数据处理阶段的序列
📋 Aggregation Pipeline Stages
📋 聚合管道阶段
- Match Stage
- Match 阶段
- Filters documents based on criteria.
- 根据条件过滤文档。
- Example:
{$match: {founded_year: 2004}} - 示例:
{$match: {founded_year: 2004}}
- Project Stage
- Project 阶段
- Reshapes documents and selects fields.
- 重塑文档并选择字段。
- Example:
{$project: {_id: 0, name: 1}} - 示例:
{$project: {_id: 0, name: 1}}
- Limit Stage
- Limit 阶段
- Restricts the number of documents returned.
- 限制返回的文档数量。
- Example:
{$limit: 5} - 示例:
{$limit: 5}
- Sort Stage
- Sort 阶段
- Orders documents based on specified fields.
- 根据指定字段对文档进行排序。
- Example:
{$sort: {name: 1}}(ascending order) - 示例:
{$sort: {name: 1}}(升序)
- Skip Stage
- Skip 阶段
- Skips a specified number of documents.
- 跳过指定数量的文档。
- Example:
{$skip: 10} - 示例:
{$skip: 10}
🔍 Aggregating Data Effectively
🔍 有效聚合数据
Order of Stages Matters:
阶段顺序很重要:
- Place the limit stage before the project stage to enhance performance.
- 将 limit 阶段放在 project 阶段之前以提高性能。
- Sorting should occur before limiting if order is important.
- 如果顺序很重要,则应在限制之前进行排序。
Example Pipeline to Retrieve Company Names:
检索公司名称的示例管道:
1 | db.companies.aggregate([ |
📊 Types of Expressions in Aggregation
📊 聚合中的表达式类型
- Boolean Expressions: Use AND, OR, NOT.
- 布尔表达式:使用 AND、OR、NOT。
- Set Expressions: Work with arrays (intersection, union).
- 集合表达式:处理数组(交集、并集)。
- Comparison Expressions: Range filters.
- 比较表达式:范围过滤器。
- Arithmetic Expressions: Basic math operations.
- 算术表达式:基本数学运算。
- String Expressions: Text manipulation.
- 字符串表达式:文本操作。
- Array Expressions: Handle and manipulate array data.
- 数组表达式:处理和操作数组数据。
- Variable Expressions: Use literals and conditionals.
- 变量表达式:使用字面量和条件。
- Accumulators: Calculate sums, averages, and statistics.
- 累加器:计算总和、平均值和统计数据。
🔧 Deep Dive: Project Stage Operations
🔧 深入探讨:Project 阶段操作
- Can promote nested fields using dot notation:
- 可以使用点表示法提升嵌套字段:
1 | db.companies.aggregate([ |
Example Document Structure:
示例文档结构:
Company Document:
公司文档:
- Fields:
_id,name,category_code,founded_year,ipo,funding_rounds
- Fields:
字段:
_id、name、category_code、founded_year、ipo、funding_rounds
📈 Key Takeaways
📈 关键要点
- Optimize Aggregation Pipelines: Place limiting stages strategically to reduce processing load.
- 优化聚合管道:策略性地放置限制阶段以减少处理负载。
- Understand Each Stage: Know the function of match, project, limit, sort, and skip for effective data query construction.
- 理解每个阶段:了解 match、project、limit、sort 和 skip 的功能,以有效地构建数据查询。
- Use Expressions Wisely: Leverage various expressions to enhance querying capabilities and data manipulation.
- 明智地使用表达式:利用各种表达式来增强查询能力和数据操作。
📊 Aggregation Framework
📊 聚合框架
Introduction to Aggregation
聚合简介
- Aggregation is a framework used to process data and return computed results.
- 聚合 是一个用于处理数据并返回计算结果的框架。
- It allows operations such as filtering, transforming, and combining data.
- 它允许进行诸如过滤、转换和组合数据等操作。
Key Components of Aggregation
聚合的关键组件
- Match Stage: Filters documents based on specified criteria.
- Match 阶段:根据指定条件过滤文档。
- Example:
{$match: {"funding_rounds.investments.financial_org.permalink": "greylock"}} - 示例:
{$match: {"funding_rounds.investments.financial_org.permalink": "greylock"}}
- Example:
- Project Stage: Reshapes documents to include only specified fields.
- Project 阶段:重塑文档以仅包含指定字段。
- Example:
{$project: {name: 1, amount: "$funding_rounds.raised_amount"}} - 示例:
{$project: {name: 1, amount: "$funding_rounds.raised_amount"}}
- Example:
- Unwind Stage: Deconstructs an array field into separate documents, allowing each element to be processed individually.
- Unwind 阶段:将数组字段分解为单独的文档,允许单独处理每个元素。
- Example:
{$unwind: "$funding_rounds"} - 示例:
{$unwind: "$funding_rounds"}
- Example:
Using the Unwind Stage
使用 Unwind 阶段
The Unwind Stage creates a document for each element in the specified array field.
Unwind 阶段 为指定数组字段中的每个元素创建一个文档。
Example Aggregation Pipeline:
聚合管道示例:
1
2
3
4
5db.companies.aggregate([
{$match: {"funding_rounds.investments.financial_org.permalink": "greylock"}},
{$unwind: "$funding_rounds"},
{$project: {name: 1, amount: "$funding_rounds.raised_amount", year: "$funding_rounds.funded_year"}}
])
Array Expressions
数组表达式
Filter Expression: A way to select a subset of elements in an array based on specified criteria.
过滤器表达式:一种根据指定条件选择数组中元素子集的方法。
- Example of usage:
- 用法示例:
1
2
3
4
5{ $filter: {
input: "$funding_rounds",
as: "round",
cond: { $gte: ["$$round.raised_amount", 100000000] }
}}
Understanding the Output
理解输出
- Output documents can have fields like
name,amount, andyear. - 输出文档可以包含
name、amount和year等字段。 - Each funding round processed will yield separate documents for clarity.
- 为清晰起见,处理的每个融资轮次都将产生单独的文档。
🚀 Key Terms
🚀 关键术语
- Aggregation: Process of computing results from data.
- 聚合:从数据计算结果的过程。
- Match: Filters documents.
- Match:过滤文档。
- Project: Reshapes output documents.
- Project:重塑输出文档。
- Unwind: Breaks down arrays into individual documents.
- Unwind:将数组分解为单个文档。
- Filter: Selects specific elements from an array.
- Filter:从数组中选择特定元素。
📊 Aggregation Framework
📊 聚合框架
Overview of Aggregation
聚合概述
- Aggregation is a way to process data and return computed results.
- 聚合是一种处理数据并返回计算结果的方法。
- It is similar to SQL’s
GROUP BYcommand, allowing for the combination of multiple documents to perform aggregate operations. - 它类似于 SQL 的
GROUP BY命令,允许组合多个文档以执行聚合操作。
Key Operators
关键操作符
$match
$match
- Filters documents based on specified criteria.
- 根据指定条件过滤文档。
- Example:
{ $match: { "founded_year": 2010 } }selects documents founded in 2010. - 示例:
{ $match: { "founded_year": 2010 } }选择成立于 2010 年的文档。
$group
$group
Groups documents by specified field(s) and performs aggregation.
按指定字段对文档进行分组并执行聚合。
Example:
示例:
1
2
3
4
5
6{
$group: {
_id: { founded_year: "$founded_year" },
average_number_of_employees: { $avg: "$number_of_employees" }
}
}
Using Array Operators
使用数组操作符
$arrayElemAt
Selects an element from an array at a specified index.
从数组中选择指定索引处的元素。
Example:
示例:
1
2
3
4
5{ $project: {
first_round: { $arrayElemAt: ["$funding_rounds", 0] },
last_round: { $arrayElemAt: ["$funding_rounds", -1] }
}
}
Example Output
输出示例
Output from an aggregation might resemble:
聚合的输出可能类似于:
1
2
3
4
5
6{
"name": "vufind",
"founded_year": 2010,
"first_round": { ... },
"last_round": { ... }
}
🔄 Relationships and Aggregation
🔄 关系和聚合
Relationship Field
关系字段
Contains data about individuals associated with companies.
包含与公司相关的个人的数据。
Structure:
结构:
1
2
3
4
5
6
7
8"relationships": [
{
"is_past": false,
"title": "Founder and CEO",
"person": { "first_name": "Mark", "last_name": "Zuckerberg" }
},
...
]
Counting Relationships
计算关系数量
Example aggregation to count relationships:
计算关系数量的聚合示例:
1
2
3
4
5
6
7
8
9
10db.companies.aggregate([
{ $match: { "relationships.person": { $ne: null } } },
{ $unwind: "$relationships" },
{ $group: {
_id: "$relationships.person",
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } }
])
Sample Output
示例输出
The output lists persons and the count of their relationships:
输出列出人员及其关系计数:
1
2
3
4{
"_id": { "first_name": "Tim", "last_name": "H" },
"count": 5
}
🗂️ Practical Applications
🗂️ 实际应用
- Aggregation allows for valuable insights such as:
- 聚合可以提供有价值的见解,例如:
- Average metrics by group (e.g., average number of employees by founding year).
- 按组划分的平均指标(例如,按成立年份划分的平均员工人数)。
- Relationship dynamics (who is connected to many companies).
- 关系动态(谁与许多公司有联系)。
🗂️ MongoDB Aggregation Framework
🗂️ MongoDB 聚合框架
🧩 Aggregation Basics
🧩 聚合基础
- Aggregation: Process of transforming data into a summary form.
- 聚合:将数据转换为摘要形式的过程。
- Purpose: Analyze and report on data, such as sales and customer behavior.
- 目的:分析和报告数据,例如销售和客户行为。
🔍 Key Aggregation Stages
🔍 关键聚合阶段
- $match: Filters documents based on specified criteria.
- $match:根据指定条件过滤文档。
- $group: Groups documents by specified fields, allowing for calculations.
- $group:按指定字段对文档进行分组,允许进行计算。
- $sort: Orders documents based on specified fields.
- $sort:根据指定字段对文档进行排序。
- $project: Reshapes documents by including or excluding fields.
- $project:通过包含或排除字段来重塑文档。
⚙️ Transactions in MongoDB
⚙️ MongoDB 中的事务
📜 Definition of a Transaction
📜 事务的定义
A transaction is a logical unit of processing that includes one or more database operations, ensuring either full completion or failure.
事务是一个逻辑处理单元,包含一个或多个数据库操作,确保完全完成或完全失败。
🔑 ACID Properties
🔑 ACID 属性
- Atomicity: All operations in a transaction are completed or none are.
- 原子性:事务中的所有操作要么全部完成,要么全部不完成。
- Consistency: Database moves from one valid state to another.
- 一致性:数据库从一个有效状态转换到另一个有效状态。
- Isolation: Transactions run independently without interference.
- 隔离性:事务独立运行,互不干扰。
- Durability: Once committed, changes persist despite failures.
- 持久性:一旦提交,即使发生故障,更改也会持久存在。
ACID Compliance:
ACID 合规性:
A database is ACID-compliant when it adheres to these properties, ensuring data integrity.
当数据库遵守这些属性时,即为 ACID 合规,从而确保数据完整性。
🛠️ Using Transactions in MongoDB
🛠️ 在 MongoDB 中使用事务
Transaction APIs:
事务 API:
| API | Core API 核心 API | Callback API 回调 API |
|---|---|---|
| Transaction Start | Requires explicit start call | Automatically starts with a callback function |
| 事务启动 | 需要显式启动调用 | 使用回调函数自动启动 |
| Error Handling | Requires manual error handling | Automatically includes error-handling logic |
| 错误处理 | 需要手动错误处理 | 自动包含错误处理逻辑 |
| Session Handling | Requires explicit session parameter | Requires explicit session parameter |
| 会话处理 | 需要显式会话参数 | 需要显式会话参数 |
🛒 Example Usage:
🛒 用法示例:
Core API Example:
核心 API 示例:
- Define operations for placing an order and updating inventory.
- 定义下订单和更新库存的操作。
Callback API Example:
回调 API 示例:
- Pass a function that includes transaction operations.
- 传递一个包含事务操作的函数。
🔄 Retry Logic in Transactions
🔄 事务中的重试逻辑
Implement retry logic to handle transient errors during transactions.
实现重试逻辑以处理事务期间的瞬时错误。
Key Functions:
关键函数:
commit_with_retry(session): Handles commit attempts.commit_with_retry(session):处理提交尝试。run_transaction_with_retry(txn_func, session): Runs transactions with retries on errors.run_transaction_with_retry(txn_func, session):在出错时带重试运行事务。
🛠️ Transactions in MongoDB
🛠️ MongoDB 中的事务
Purpose of Transactions:
事务的目的:
Transactions ensure data integrity and atomicity for multiple operations.
事务确保多个操作的数据完整性和原子性。
Key Features:
主要特性:
- Provide consistency across multiple operations.
- 在多个操作之间提供一致性。
- Should be used sparingly, given the flexibility of MongoDB’s document model.
- 鉴于 MongoDB 文档模型的灵活性,应谨慎使用。
🔄 Replication in MongoDB
🔄 MongoDB 中的复制
Definition:
定义:
Replication is the process of keeping identical copies of data across multiple servers.
复制 是在多个服务器上保留相同数据副本的过程。
Benefits:
优点:
- Enhances data availability and safety.
- 提高数据可用性和安全性。
- Allows continued access to data even if one or more servers fail.
- 即使一个或多个服务器发生故障,也允许继续访问数据。
Replica Set:
副本集:
- A configuration of multiple MongoDB servers, including one primary and several secondaries.
- 多个 MongoDB 服务器的配置,包括一个主服务器和几个辅助服务器。
- The primary handles write operations, while secondaries maintain copies of the primary’s data.
- 主服务器处理写操作,而辅助服务器维护主服务器数据的副本。
Setting Up a Replica Set:
设置副本集:
Create Data Directories:
创建数据目录:
- Linux/Mac:
mkdir -p ~/data/rs{1,2,3} - Linux/Mac:
mkdir -p ~/data/rs{1,2,3} - Windows:
md c:\data\rs1 c:\data\rs2 c:\data\rs3 - Windows:
md c:\data\rs1 c:\data\rs2 c:\data\rs3
- Linux/Mac:
Start MongoDB Instances:
启动 MongoDB 实例:
Run the following commands in separate terminals:
在单独的终端中运行以下命令:
Linux/Mac:
Linux/Mac:
1
2
3mongod --replSet mdbDefGuide --dbpath ~/data/rs1 --port 27017 --smallfiles --oplogSize 200
mongod --replSet mdbDefGuide --dbpath ~/data/rs2 --port 27018 --smallfiles --oplogSize 200
mongod --replSet mdbDefGuide --dbpath ~/data/rs3 --port 27019 --smallfiles --oplogSize 200Windows:
Windows:
1
2
3mongod --replSet mdbDefGuide --dbpath c:\data\rs1 --port 27017 --smallfiles --oplogSize 200
mongod --replSet mdbDefGuide --dbpath c:\data\rs2 --port 27018 --smallfiles --oplogSize 200
mongod --replSet mdbDefGuide --dbpath c:\data\rs3 --port 27019 --smallfiles --oplogSize 200
Initiate the Replica Set:
初始化副本集:
Connect to one instance:
连接到一个实例:
1
mongo --port 27017
Create and initiate config:
创建并初始化配置:
1
2
3
4
5
6
7
8
9rsconf = {
_id: "mdbDefGuide",
members: [
{_id: 0, host: "localhost:27017"},
{_id: 1, host: "localhost:27018"},
{_id: 2, host: "localhost:27019"}
]
}
rs.initiate(rsconf)
📊 Observing Replication
📊 观察复制
Check Replica Set Status:
检查副本集状态:
- Use
rs.status()to view the status of the replica set, including primary and secondary members. - 使用
rs.status()查看副本集的状态,包括主成员和辅助成员。
- Use
Writing Data:
写入数据:
Connect to the primary and perform write operations to test replication:
连接到主服务器并执行写操作以测试复制:
1
2
3use test
for (i = 0; i < 1000; i++) { db.coll.insert({count: i}) }
db.coll.count() // Should return 1000
📊 MongoDB: Aggregation Framework, Transactions, and Replication
📊 MongoDB:聚合框架、事务和复制
🧩 Key Concepts
🧩 关键概念
- Aggregation Framework:
- 聚合框架:
- Utilizes a pipeline approach for data analysis.
- 利用管道方法进行数据分析。
- Common stages include:
- 常用阶段包括:
- $match: Filters documents based on criteria.
- $match:根据条件过滤文档。
- $group: Groups documents together.
- $group:将文档分组。
- $project: Reshapes documents by including/excluding fields.
- $project:通过包含/排除字段来重塑文档。
- $sort: Orders documents based on specified fields.
- $sort:根据指定字段对文档进行排序。
- $limit: Restricts the number of documents passing through the pipeline.
- $limit:限制通过管道的文档数量。
- $skip: Skips a specified number of documents.
- $skip:跳过指定数量的文档。
- Transactions:
- 事务:
- Ensure ACID compliance for operations across multiple documents and collections.
- 确保跨多个文档和集合的操作符合 ACID。
- Maintain data integrity during multi-document operations.
- 在多文档操作期间维护数据完整性。
- Replication:
- 复制:
- Provides high availability and data redundancy.
- 提供高可用性和数据冗余。
- A replica set consists of multiple servers maintaining identical data copies for failover support.
- 副本集由多个服务器组成,这些服务器维护相同的数据副本以支持故障转移。
🔍 Important Commands and Usages
🔍 重要命令和用法
Check Primary Status:
检查主节点状态:
Use db.isMaster() to determine the primary and secondary members of a replica set.
使用 db.isMaster() 来确定副本集的主节点和从节点成员。
Reading from Secondaries:
从从节点读取:
By default, clients cannot read from secondaries. To allow this, use:
默认情况下,客户端无法从从节点读取。要允许这样做,请使用:
1
secondaryConn.setSlaveOk()
Error Handling:
错误处理:
Attempting to read from a secondary without permission will return:
尝试在没有权限的情况下从从节点读取将返回:
1
2
3
4
5{
"ok": 0,
"errmsg": "not master and slaveOk=false",
"code": 13435
}
Writing to Secondaries:
向从节点写入:
- Clients cannot perform write operations directly on secondaries. Writes are only accepted through replication.
- 客户端不能直接在从节点上执行写操作。写入只能通过复制来接受。
Automatic Failover:
自动故障转移:
- If the primary goes down, one of the secondaries is automatically elected as primary.
- 如果主节点宕机,其中一个从节点将自动被选为主节点。