📊 MongoDB Aggregation Framework

📊 MongoDB 聚合框架

  • Purpose: Enables complex data analysis and transformation similar to SQL’s GROUP BY and JOIN.
  • 目的:实现复杂的数据分析和转换,类似于 SQL 的 GROUP BY 和 JOIN。
  • Pipeline Approach: Data flows through multiple stages, each performing a specific operation.
  • 管道方法:数据流经多个阶段,每个阶段执行特定的操作。

Common Stages:

常用阶段

  • $match: Filters documents.
  • $match:过滤文档。
  • $group: Groups documents by a specific key.
  • $group:按特定键对文档进行分组。
  • $project: Reshapes documents.
  • $project:重塑文档。
  • $sort: Orders results.
  • $sort:对结果进行排序。

🔄 Transactions in MongoDB

🔄 MongoDB 中的事务

Transaction: A logical group of operations ensuring data integrity.

事务:确保数据完整性的逻辑操作组。

  • ACID Compliance: Ensures Atomicity, Consistency, Isolation, Durability.

  • ACID 合规性:确保原子性、一致性、隔离性、持久性。

  • Use Case: In e-commerce, when a customer places an order:

  • 用例:在电子商务中,当客户下订单时:

    • Deduct inventory.
    • 扣减库存。
    • Record order details.
    • 记录订单详情。
    • Update order history.
    • 更新订单历史。
  • Rollback Mechanism: If any operation fails, the transaction is aborted.

  • 回滚机制:如果任何操作失败,事务将被中止。


⚙️ Replication

⚙️ 复制

Replication: Keeping identical data copies across multiple servers for high availability and safety.

复制:在多个服务器上保留相同的数据副本,以实现高可用性和安全性。

  • Recommended: For all production deployments.
  • 建议:用于所有生产部署。
  • Replica Set: A configuration (e.g., ecommerceReplicaSet) with multiple members to ensure data redundancy and failover support.
  • 副本集:一种配置(例如 ecommerceReplicaSet),包含多个成员以确保数据冗余和故障转移支持。

📈 Aggregation Pipeline Details

📈 聚合管道详情

  • Concept: A sequence of processing stages where documents pass through operations.
  • 概念:文档通过操作的一系列处理阶段。
  • Tunable Stages: Each stage can be parameterized with operators to modify fields or perform arithmetic operations.
  • 可调阶段:每个阶段都可以使用操作符进行参数化,以修改字段或执行算术运算。

Example of Aggregation Pipeline:

聚合管道示例

  1. Initial Filter: Use $match to filter documents.
  2. 初始过滤:使用 $match 过滤文档。
  3. Further Processing: Apply additional filters or transformations later in the pipeline.
  4. 进一步处理:在管道的后续阶段应用额外的过滤器或转换。

Sample Query:

示例查询

  • To find companies founded in 2004:

  • 查找成立于 2004 年的公司:

    1
    2
    3
    db.companies.aggregate([
    { $match: { founded_year: 2004 } }
    ])

📋 Example Output Transformation:

📋 示例输出转换

  • Adding a project stage to limit output fields:

  • 添加一个 project 阶段来限制输出字段:

    1
    2
    3
    4
    db.companies.aggregate([
    { $match: { founded_year: 2004 } },
    { $project: { _id: 0, name: 1, founded_year: 1 } }
    ])

📚 Key Takeaways:

📚 关键要点

  • Aggregation Framework: Essential for complex data processing and analytics.
  • 聚合框架:对于复杂数据处理和分析至关重要。
  • Transactions: Crucial for maintaining data integrity in multi-document operations.
  • 事务:对于在多文档操作中维护数据完整性至关重要。
  • Replication: Vital for ensuring data availability and fault tolerance in production environments.
  • 复制:对于确保生产环境中的数据可用性和容错性至关重要。

🛠️ Aggregation Framework Overview

🛠️ 聚合框架概述

  • Aggregation — Process of transforming data into a summary format
  • 聚合 — 将数据转换为摘要格式的过程
  • Pipeline — A sequence of data processing stages
  • 管道 — 数据处理阶段的序列

📋 Aggregation Pipeline Stages

📋 聚合管道阶段

  1. Match Stage
  2. Match 阶段
    • Filters documents based on criteria.
    • 根据条件过滤文档。
    • Example: {$match: {founded_year: 2004}}
    • 示例:{$match: {founded_year: 2004}}
  3. Project Stage
  4. Project 阶段
    • Reshapes documents and selects fields.
    • 重塑文档并选择字段。
    • Example: {$project: {_id: 0, name: 1}}
    • 示例:{$project: {_id: 0, name: 1}}
  5. Limit Stage
  6. Limit 阶段
    • Restricts the number of documents returned.
    • 限制返回的文档数量。
    • Example: {$limit: 5}
    • 示例:{$limit: 5}
  7. Sort Stage
  8. Sort 阶段
    • Orders documents based on specified fields.
    • 根据指定字段对文档进行排序。
    • Example: {$sort: {name: 1}} (ascending order)
    • 示例:{$sort: {name: 1}} (升序)
  9. Skip Stage
  10. Skip 阶段
    • Skips a specified number of documents.
    • 跳过指定数量的文档。
    • Example: {$skip: 10}
    • 示例:{$skip: 10}

🔍 Aggregating Data Effectively

🔍 有效聚合数据

  • Order of Stages Matters:

  • 阶段顺序很重要:

    • Place the limit stage before the project stage to enhance performance.
    • 将 limit 阶段放在 project 阶段之前以提高性能。
    • Sorting should occur before limiting if order is important.
    • 如果顺序很重要,则应在限制之前进行排序。

Example Pipeline to Retrieve Company Names:

检索公司名称的示例管道:

1
2
3
4
5
db.companies.aggregate([
{$match: {founded_year: 2004}},
{$limit: 5},
{$project: {_id: 0, name: 1}}
])

📊 Types of Expressions in Aggregation

📊 聚合中的表达式类型

  • Boolean Expressions: Use AND, OR, NOT.
  • 布尔表达式:使用 AND、OR、NOT。
  • Set Expressions: Work with arrays (intersection, union).
  • 集合表达式:处理数组(交集、并集)。
  • Comparison Expressions: Range filters.
  • 比较表达式:范围过滤器。
  • Arithmetic Expressions: Basic math operations.
  • 算术表达式:基本数学运算。
  • String Expressions: Text manipulation.
  • 字符串表达式:文本操作。
  • Array Expressions: Handle and manipulate array data.
  • 数组表达式:处理和操作数组数据。
  • Variable Expressions: Use literals and conditionals.
  • 变量表达式:使用字面量和条件。
  • Accumulators: Calculate sums, averages, and statistics.
  • 累加器:计算总和、平均值和统计数据。

🔧 Deep Dive: Project Stage Operations

🔧 深入探讨:Project 阶段操作

  • Can promote nested fields using dot notation:
  • 可以使用点表示法提升嵌套字段:
1
2
3
4
5
6
7
8
9
10
db.companies.aggregate([
{$match: {"funding_rounds.investments.financial_org.permalink": "greylock"}},
{$project: {
_id: 0,
name: 1,
ipo: "$ipo.pub_year",
valuation: "$ipo.valuation_amount",
funders: "$funding_rounds.investments.financial_org.permalink"
}}
])

Example Document Structure:

示例文档结构:

  • Company Document:

  • 公司文档:

    • Fields: _id, name, category_code, founded_year, ipo, funding_rounds
  • 字段:_idnamecategory_codefounded_yearipofunding_rounds


📈 Key Takeaways

📈 关键要点

  • Optimize Aggregation Pipelines: Place limiting stages strategically to reduce processing load.
  • 优化聚合管道:策略性地放置限制阶段以减少处理负载。
  • Understand Each Stage: Know the function of match, project, limit, sort, and skip for effective data query construction.
  • 理解每个阶段:了解 match、project、limit、sort 和 skip 的功能,以有效地构建数据查询。
  • Use Expressions Wisely: Leverage various expressions to enhance querying capabilities and data manipulation.
  • 明智地使用表达式:利用各种表达式来增强查询能力和数据操作。

📊 Aggregation Framework

📊 聚合框架

Introduction to Aggregation

聚合简介

  • Aggregation is a framework used to process data and return computed results.
  • 聚合 是一个用于处理数据并返回计算结果的框架。
  • It allows operations such as filtering, transforming, and combining data.
  • 它允许进行诸如过滤、转换和组合数据等操作。

Key Components of Aggregation

聚合的关键组件

  1. Match Stage: Filters documents based on specified criteria.
  2. Match 阶段:根据指定条件过滤文档。
    • Example: {$match: {"funding_rounds.investments.financial_org.permalink": "greylock"}}
    • 示例:{$match: {"funding_rounds.investments.financial_org.permalink": "greylock"}}
  3. Project Stage: Reshapes documents to include only specified fields.
  4. Project 阶段:重塑文档以仅包含指定字段。
    • Example: {$project: {name: 1, amount: "$funding_rounds.raised_amount"}}
    • 示例:{$project: {name: 1, amount: "$funding_rounds.raised_amount"}}
  5. Unwind Stage: Deconstructs an array field into separate documents, allowing each element to be processed individually.
  6. Unwind 阶段:将数组字段分解为单独的文档,允许单独处理每个元素。
    • Example: {$unwind: "$funding_rounds"}
    • 示例:{$unwind: "$funding_rounds"}

Using the Unwind Stage

使用 Unwind 阶段

The Unwind Stage creates a document for each element in the specified array field.

Unwind 阶段 为指定数组字段中的每个元素创建一个文档。

  • Example Aggregation Pipeline:

  • 聚合管道示例:

    1
    2
    3
    4
    5
    db.companies.aggregate([
    {$match: {"funding_rounds.investments.financial_org.permalink": "greylock"}},
    {$unwind: "$funding_rounds"},
    {$project: {name: 1, amount: "$funding_rounds.raised_amount", year: "$funding_rounds.funded_year"}}
    ])

Array Expressions

数组表达式

  • Filter Expression: A way to select a subset of elements in an array based on specified criteria.

  • 过滤器表达式:一种根据指定条件选择数组中元素子集的方法。

    • Example of usage:
    • 用法示例:
    1
    2
    3
    4
    5
    { $filter: {
    input: "$funding_rounds",
    as: "round",
    cond: { $gte: ["$$round.raised_amount", 100000000] }
    }}

Understanding the Output

理解输出

  • Output documents can have fields like name, amount, and year.
  • 输出文档可以包含 nameamountyear 等字段。
  • Each funding round processed will yield separate documents for clarity.
  • 为清晰起见,处理的每个融资轮次都将产生单独的文档。

🚀 Key Terms

🚀 关键术语

  • Aggregation: Process of computing results from data.
  • 聚合:从数据计算结果的过程。
  • Match: Filters documents.
  • Match:过滤文档。
  • Project: Reshapes output documents.
  • Project:重塑输出文档。
  • Unwind: Breaks down arrays into individual documents.
  • Unwind:将数组分解为单个文档。
  • Filter: Selects specific elements from an array.
  • Filter:从数组中选择特定元素。

📊 Aggregation Framework

📊 聚合框架

Overview of Aggregation

聚合概述

  • Aggregation is a way to process data and return computed results.
  • 聚合是一种处理数据并返回计算结果的方法。
  • It is similar to SQL’s GROUP BY command, allowing for the combination of multiple documents to perform aggregate operations.
  • 它类似于 SQL 的 GROUP BY 命令,允许组合多个文档以执行聚合操作。

Key Operators

关键操作符

$match

$match

  • Filters documents based on specified criteria.
  • 根据指定条件过滤文档。
  • Example: { $match: { "founded_year": 2010 } } selects documents founded in 2010.
  • 示例:{ $match: { "founded_year": 2010 } } 选择成立于 2010 年的文档。

$group

$group

  • Groups documents by specified field(s) and performs aggregation.

  • 按指定字段对文档进行分组并执行聚合。

  • Example:

  • 示例:

    1
    2
    3
    4
    5
    6
    {
    $group: {
    _id: { founded_year: "$founded_year" },
    average_number_of_employees: { $avg: "$number_of_employees" }
    }
    }

Using Array Operators

使用数组操作符

$arrayElemAt

  • Selects an element from an array at a specified index.

  • 从数组中选择指定索引处的元素。

  • Example:

  • 示例:

    1
    2
    3
    4
    5
    { $project: {
    first_round: { $arrayElemAt: ["$funding_rounds", 0] },
    last_round: { $arrayElemAt: ["$funding_rounds", -1] }
    }
    }

Example Output

输出示例

  • Output from an aggregation might resemble:

  • 聚合的输出可能类似于:

    1
    2
    3
    4
    5
    6
    {
    "name": "vufind",
    "founded_year": 2010,
    "first_round": { ... },
    "last_round": { ... }
    }

🔄 Relationships and Aggregation

🔄 关系和聚合

Relationship Field

关系字段

  • Contains data about individuals associated with companies.

  • 包含与公司相关的个人的数据。

  • Structure:

  • 结构:

    1
    2
    3
    4
    5
    6
    7
    8
    "relationships": [
    {
    "is_past": false,
    "title": "Founder and CEO",
    "person": { "first_name": "Mark", "last_name": "Zuckerberg" }
    },
    ...
    ]

Counting Relationships

计算关系数量

  • Example aggregation to count relationships:

  • 计算关系数量的聚合示例:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    db.companies.aggregate([
    { $match: { "relationships.person": { $ne: null } } },
    { $unwind: "$relationships" },
    { $group: {
    _id: "$relationships.person",
    count: { $sum: 1 }
    }
    },
    { $sort: { count: -1 } }
    ])

Sample Output

示例输出

  • The output lists persons and the count of their relationships:

  • 输出列出人员及其关系计数:

    1
    2
    3
    4
    {
    "_id": { "first_name": "Tim", "last_name": "H" },
    "count": 5
    }

🗂️ Practical Applications

🗂️ 实际应用

  • Aggregation allows for valuable insights such as:
  • 聚合可以提供有价值的见解,例如:
    • Average metrics by group (e.g., average number of employees by founding year).
    • 按组划分的平均指标(例如,按成立年份划分的平均员工人数)。
    • Relationship dynamics (who is connected to many companies).
    • 关系动态(谁与许多公司有联系)。

🗂️ MongoDB Aggregation Framework

🗂️ MongoDB 聚合框架

🧩 Aggregation Basics

🧩 聚合基础

  • Aggregation: Process of transforming data into a summary form.
  • 聚合:将数据转换为摘要形式的过程。
  • Purpose: Analyze and report on data, such as sales and customer behavior.
  • 目的:分析和报告数据,例如销售和客户行为。

🔍 Key Aggregation Stages

🔍 关键聚合阶段

  • $match: Filters documents based on specified criteria.
  • $match:根据指定条件过滤文档。
  • $group: Groups documents by specified fields, allowing for calculations.
  • $group:按指定字段对文档进行分组,允许进行计算。
  • $sort: Orders documents based on specified fields.
  • $sort:根据指定字段对文档进行排序。
  • $project: Reshapes documents by including or excluding fields.
  • $project:通过包含或排除字段来重塑文档。

⚙️ Transactions in MongoDB

⚙️ MongoDB 中的事务

📜 Definition of a Transaction

📜 事务的定义

A transaction is a logical unit of processing that includes one or more database operations, ensuring either full completion or failure.

事务是一个逻辑处理单元,包含一个或多个数据库操作,确保完全完成或完全失败。

🔑 ACID Properties

🔑 ACID 属性

  • Atomicity: All operations in a transaction are completed or none are.
  • 原子性:事务中的所有操作要么全部完成,要么全部不完成。
  • Consistency: Database moves from one valid state to another.
  • 一致性:数据库从一个有效状态转换到另一个有效状态。
  • Isolation: Transactions run independently without interference.
  • 隔离性:事务独立运行,互不干扰。
  • Durability: Once committed, changes persist despite failures.
  • 持久性:一旦提交,即使发生故障,更改也会持久存在。

ACID Compliance:

ACID 合规性

A database is ACID-compliant when it adheres to these properties, ensuring data integrity.

当数据库遵守这些属性时,即为 ACID 合规,从而确保数据完整性。



🛠️ Using Transactions in MongoDB

🛠️ 在 MongoDB 中使用事务

Transaction APIs:

事务 API

API Core API 核心 API Callback API 回调 API
Transaction Start Requires explicit start call Automatically starts with a callback function
事务启动 需要显式启动调用 使用回调函数自动启动
Error Handling Requires manual error handling Automatically includes error-handling logic
错误处理 需要手动错误处理 自动包含错误处理逻辑
Session Handling Requires explicit session parameter Requires explicit session parameter
会话处理 需要显式会话参数 需要显式会话参数

🛒 Example Usage:

🛒 用法示例

  1. Core API Example:

  2. 核心 API 示例:

    • Define operations for placing an order and updating inventory.
    • 定义下订单和更新库存的操作。
  3. Callback API Example:

  4. 回调 API 示例:

    • Pass a function that includes transaction operations.
  • 传递一个包含事务操作的函数。

🔄 Retry Logic in Transactions

🔄 事务中的重试逻辑

  • Implement retry logic to handle transient errors during transactions.

  • 实现重试逻辑以处理事务期间的瞬时错误。

  • Key Functions:

  • 关键函数:

    • commit_with_retry(session): Handles commit attempts.
    • commit_with_retry(session):处理提交尝试。
    • run_transaction_with_retry(txn_func, session): Runs transactions with retries on errors.
    • run_transaction_with_retry(txn_func, session):在出错时带重试运行事务。

🛠️ Transactions in MongoDB

🛠️ MongoDB 中的事务

  • Purpose of Transactions:

  • 事务的目的

    Transactions ensure data integrity and atomicity for multiple operations.

    事务确保多个操作的数据完整性和原子性。

  • Key Features:

  • 主要特性

    • Provide consistency across multiple operations.
    • 在多个操作之间提供一致性。
    • Should be used sparingly, given the flexibility of MongoDB’s document model.
    • 鉴于 MongoDB 文档模型的灵活性,应谨慎使用。


🔄 Replication in MongoDB

🔄 MongoDB 中的复制

  • Definition:

  • 定义

    Replication is the process of keeping identical copies of data across multiple servers.

    复制 是在多个服务器上保留相同数据副本的过程。

  • Benefits:

  • 优点

    • Enhances data availability and safety.
    • 提高数据可用性和安全性。
    • Allows continued access to data even if one or more servers fail.
    • 即使一个或多个服务器发生故障,也允许继续访问数据。
  • Replica Set:

  • 副本集

    • A configuration of multiple MongoDB servers, including one primary and several secondaries.
    • 多个 MongoDB 服务器的配置,包括一个主服务器和几个辅助服务器。
    • The primary handles write operations, while secondaries maintain copies of the primary’s data.
    • 主服务器处理写操作,而辅助服务器维护主服务器数据的副本。

Setting Up a Replica Set:

设置副本集

  1. Create Data Directories:

  2. 创建数据目录

    • Linux/Mac: mkdir -p ~/data/rs{1,2,3}
    • Linux/Mac:mkdir -p ~/data/rs{1,2,3}
    • Windows: md c:\data\rs1 c:\data\rs2 c:\data\rs3
    • Windows:md c:\data\rs1 c:\data\rs2 c:\data\rs3
  3. Start MongoDB Instances:

  4. 启动 MongoDB 实例

    • Run the following commands in separate terminals:

    • 在单独的终端中运行以下命令:

      • Linux/Mac:

      • Linux/Mac:

        1
        2
        3
        mongod --replSet mdbDefGuide --dbpath ~/data/rs1 --port 27017 --smallfiles --oplogSize 200
        mongod --replSet mdbDefGuide --dbpath ~/data/rs2 --port 27018 --smallfiles --oplogSize 200
        mongod --replSet mdbDefGuide --dbpath ~/data/rs3 --port 27019 --smallfiles --oplogSize 200
      • Windows:

      • Windows:

        1
        2
        3
        mongod --replSet mdbDefGuide --dbpath c:\data\rs1 --port 27017 --smallfiles --oplogSize 200
        mongod --replSet mdbDefGuide --dbpath c:\data\rs2 --port 27018 --smallfiles --oplogSize 200
        mongod --replSet mdbDefGuide --dbpath c:\data\rs3 --port 27019 --smallfiles --oplogSize 200
  5. Initiate the Replica Set:

  6. 初始化副本集

    • Connect to one instance:

    • 连接到一个实例:

      1
      mongo --port 27017
    • Create and initiate config:

    • 创建并初始化配置:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      rsconf = {
      _id: "mdbDefGuide",
      members: [
      {_id: 0, host: "localhost:27017"},
      {_id: 1, host: "localhost:27018"},
      {_id: 2, host: "localhost:27019"}
      ]
      }
      rs.initiate(rsconf)

📊 Observing Replication

📊 观察复制

  • Check Replica Set Status:

  • 检查副本集状态

    • Use rs.status() to view the status of the replica set, including primary and secondary members.
    • 使用 rs.status() 查看副本集的状态,包括主成员和辅助成员。
  • Writing Data:

  • 写入数据

    • Connect to the primary and perform write operations to test replication:

    • 连接到主服务器并执行写操作以测试复制:

      1
      2
      3
      use test
      for (i = 0; i < 1000; i++) { db.coll.insert({count: i}) }
      db.coll.count() // Should return 1000

📊 MongoDB: Aggregation Framework, Transactions, and Replication

📊 MongoDB:聚合框架、事务和复制

🧩 Key Concepts

🧩 关键概念

  • Aggregation Framework:
  • 聚合框架
    • Utilizes a pipeline approach for data analysis.
    • 利用管道方法进行数据分析。
    • Common stages include:
    • 常用阶段包括:
      • $match: Filters documents based on criteria.
      • $match:根据条件过滤文档。
      • $group: Groups documents together.
      • $group:将文档分组。
      • $project: Reshapes documents by including/excluding fields.
      • $project:通过包含/排除字段来重塑文档。
      • $sort: Orders documents based on specified fields.
      • $sort:根据指定字段对文档进行排序。
      • $limit: Restricts the number of documents passing through the pipeline.
      • $limit:限制通过管道的文档数量。
      • $skip: Skips a specified number of documents.
      • $skip:跳过指定数量的文档。
  • Transactions:
  • 事务
    • Ensure ACID compliance for operations across multiple documents and collections.
    • 确保跨多个文档和集合的操作符合 ACID。
    • Maintain data integrity during multi-document operations.
    • 在多文档操作期间维护数据完整性。
  • Replication:
  • 复制
    • Provides high availability and data redundancy.
    • 提供高可用性和数据冗余。
    • A replica set consists of multiple servers maintaining identical data copies for failover support.
    • 副本集由多个服务器组成,这些服务器维护相同的数据副本以支持故障转移。

🔍 Important Commands and Usages

🔍 重要命令和用法

  • Check Primary Status:

  • 检查主节点状态

    Use db.isMaster() to determine the primary and secondary members of a replica set.

    使用 db.isMaster() 来确定副本集的主节点和从节点成员。

  • Reading from Secondaries:

  • 从从节点读取

    • By default, clients cannot read from secondaries. To allow this, use:

    • 默认情况下,客户端无法从从节点读取。要允许这样做,请使用:

      1
      secondaryConn.setSlaveOk()
  • Error Handling:

  • 错误处理

    • Attempting to read from a secondary without permission will return:

    • 尝试在没有权限的情况下从从节点读取将返回:

      1
      2
      3
      4
      5
      {
      "ok": 0,
      "errmsg": "not master and slaveOk=false",
      "code": 13435
      }
  • Writing to Secondaries:

  • 向从节点写入

    • Clients cannot perform write operations directly on secondaries. Writes are only accepted through replication.
    • 客户端不能直接在从节点上执行写操作。写入只能通过复制来接受。
  • Automatic Failover:

  • 自动故障转移

    • If the primary goes down, one of the secondaries is automatically elected as primary.
    • 如果主节点宕机,其中一个从节点将自动被选为主节点。