Hipo日志

MongoDB Schema 设计

4/3/2021, 8:50:18 PM

MongoDB不需要像关系型数据库一样提前设计表结构，但是仍然需要处理业务间的关系，甚至因为灵活性，其中一对多One-to-N需要格外注意。

One-to-N基本方法#

MongoDB中One-to-N模型可以简单通过在父文档中嵌入一组子文档(sub-documents)，但并不意味着你应该这么做。

在mongo中你需要考虑更多的问题：关系(relationship)的基数(cardinality)是什么?

你需要更精细地判断以下的情况：

基数关系	实现	优缺点
One-to-Few	`嵌入(embedding)`	优点：不必执行单独的查询来获取子文档的信息
One-to-Few	`嵌入(embedding)`	缺点：无法将嵌入的信息作为独立实体( stand-alone entities)来访问
One-to-Many	`引用(referencing)`	优点1. 关系的两边都是独立的文档，很容易对它们进行搜索和单独更新。
		优点2. 可以用同样的方式变成`N-to-N`的模式，而不需要`关联表(Join table)`
		缺点：需要两次查询才能获取many部分的详情(可通过`反规范化(denormalizing)`来适当优化)
One-to-Squillions	`父引用(parent-referencing)`	适用于巨量N的场景，例如日志类业务，即使只存储ObjectID，也会很轻松会溢出mongo 16M的限制

One-to-N进阶#

通过上面的精细化判断，已经可以设计一个合理的One-to-N的模式了。但是在具体不同的业务中，仍然需要一些方法来优化One-to-N的模型。

	实现
`双向引用(Two-Way referencing)`	1. one数组引用N 2. 单个N中冗余引用one	优点：方便从单个N查到One
`双向引用(Two-Way referencing)`	1. one数组引用N 2. 单个N中冗余引用one	缺点：必须同时更新两边的引用，也就是无法通过`原子更新(atomic update)`完成双向引用
`反规范化(denormalizing)`	`Many -> One`：One的数组中不只引用ObjectID，还冗余保存其他的N中的字段。	1. 经常需要读取冗余字段 2. 很少需要更新冗余字段
	`Many -> One`：One的数组中不只引用ObjectID，还冗余保存其他的N中的字段。	更新的时候，会有一段`sub-second`的时间间隔，`反规范化(denormalizing)`中的字段的值，不是最新的。
	`One -> Many`：N的部分可以冗余One的字段	同样的逻辑，重要的是 `读写比(read-to-write ratio)`
	`One -> Squillions`方法一：one侧的信息冗余到Squillions中	--
	`One -> Squillions`方法二：one侧冗余保存一部分Squillions	例如，可以在One中保存最新的1000条N，在mongo中借助 `$each / $slice` 功能来保持列表排序，并且只保留最后的1000条

设计准则#

首选嵌入( embedding)，除非有绝对的理由不这么做。
需要独立访问对象就不要将其嵌入。
数据不应该无限增长。如果有上百个以上的N，不要整个嵌入，如果有上千个N，也不要使用ObjectID数组引用。巨量数组就不要嵌入。
不要害怕应用程序级别的连接(application-level joins): 正确使用索引和projection specifier，它不会比关系型数据库中的服务端join(server-side joins)更昂贵。
使用反规范化(denormalizing)时要慎重考虑读/写比率。读多写少才适合使用反规范化，冗余部分字段。
最终，如何对数据建模完全取决与业务中数据的访问模式。根据查询和更新数据的方式来设计你的数据模型。

代码实例#

One-to-Few#

> db.person.findOne()
{
  name: 'Kate Monster',
  ssn: '123-456-7890',
  addresses : [
     { street: '123 Sesame St', city: 'Anytown', cc: 'USA' },
     { street: '123 Avenue Q', city: 'New York', cc: 'USA' }
  ]
}

One-to-Many#

> db.parts.findOne()
{
    _id : ObjectID('AAAA'),
    partno : '123-aff-456',
    name : '#4 grommet',
    qty: 94,
    cost: 0.94,
    price: 3.99
}

> db.products.findOne()
{
    name : 'left-handed smoke shifter',
    manufacturer : 'Acme Corp',
    catalog_number: 1234,
    parts : [     // array of references to Part documents
        ObjectID('AAAA'),    // reference to the #4 grommet above
        ObjectID('F17C'),    // reference to a different Part
        ObjectID('D2AA'),
        // etc
    ]
}

使用application-level join来获取指定产品的零件。

// Fetch the Product document identified by this catalog number
> product = db.products.findOne({catalog_number: 1234});
   // Fetch all the Parts that are linked to this Product
> product_parts = db.parts.find({_id: { $in : product.parts } } ).toArray() ;

One-to-Squillions#

> db.hosts.findOne()
{
    _id : ObjectID('AAAB'),
    name : 'goofy.example.com',
    ipaddr : '127.66.66.66'
}

>db.logmsg.findOne()
{
    time : ISODate("2014-03-28T09:42:41.382Z"),
    message : 'cpu is on fire!',
    host: ObjectID('AAAB')       // Reference to the Host document
}

使用稍微不同的application-level join来获取指定host的最近5000条日志。

  // find the parent ‘host’ document
> host = db.hosts.findOne({ipaddr : '127.66.66.66'});  // assumes unique index
   // find the most recent 5000 log message documents linked to that host
> last_5k_msg = db.logmsg.find({host: host._id}).sort({time : -1}).limit(5000).toArray()

Two-Way Referencing#

db.person.findOne()
{
    _id: ObjectID("AAF1"),
    name: "Kate Monster",
    tasks [     // array of references to Task documents
        ObjectID("ADF9"), 
        ObjectID("AE02"),
        ObjectID("AE73") 
        // etc
    ]
}

db.tasks.findOne()
{
    _id: ObjectID("ADF9"), 
    description: "Write lesson plan",
    due_date:  ISODate("2014-04-01"),
    owner: ObjectID("AAF1")     // Reference to Person document
}

Denormalizing Many - One#

> db.products.findOne()
{
    name : 'left-handed smoke shifter',
    manufacturer : 'Acme Corp',
    catalog_number: 1234,
    parts : [
        { id : ObjectID('AAAA'), name : '#4 grommet' },         // Part name is denormalized
        { id: ObjectID('F17C'), name : 'fan blade assembly' },
        { id: ObjectID('D2AA'), name : 'power switch' },
        // etc
    ]
}

// Fetch the product document
> product = db.products.findOne({catalog_number: 1234});  
  // Create an array of ObjectID()s containing *just* the part numbers
> part_ids = product.parts.map( function(doc) { return doc.id } );
  // Fetch all the Parts that are linked to this Product
> product_parts = db.parts.find({_id: { $in : part_ids } } ).toArray() ;

Denormalizing One - Many#

> db.parts.findOne()
{
    _id : ObjectID('AAAA'),
    partno : '123-aff-456',
    name : '#4 grommet',
    product_name : 'left-handed smoke shifter',   // Denormalized from the ‘Product’ document
    product_catalog_number: 1234,                     // Ditto
    qty: 94,
    cost: 0.94,
    price: 3.99
}

Denormalizing One-To-Squillions#

one 侧冗余到Squillions#

> db.logmsg.findOne()
{
    time : ISODate("2014-03-28T09:42:41.382Z"),
    message : 'cpu is on fire!',
    ipaddr : '127.66.66.66',
    host: ObjectID('AAAB')
}

> last_5k_msg = db.logmsg.find({ipaddr : '127.66.66.66'}).sort({time : -1}).limit(5000).toArray()

甚至，你可以冗余全部的One侧到squillions

> db.logmsg.findOne()
{
    time : ISODate("2014-03-28T09:42:41.382Z"),
    message : 'cpu is on fire!',
    ipaddr : '127.66.66.66',
    hostname : 'goofy.example.com',
}

Squillions侧冗余到one#

 //  Get log message from monitoring system
logmsg = get_log_msg();
log_message_here = logmsg.msg;
log_ip = logmsg.ipaddr;
  // Get current timestamp
now = new Date()
  // Find the _id for the host I’m updating
host_doc = db.hosts.findOne({ipaddr : log_ip },{_id:1});  // Don’t return the whole document
host_id = host_doc._id;
  // Insert the log message, the parent reference, and the denormalized data into the ‘many’ side
db.logmsg.save({time : now, message : log_message_here, ipaddr : log_ip, host : host_id ) });
  // Push the denormalized log message onto the ‘one’ side
db.hosts.update( {_id: host_id }, 
        {$push : {logmsgs : { $each:  [ { time : now, message : log_message_here } ],
                           $sort:  { time : 1 },  // Only keep the latest ones 
                           $slice: -1000 }        // Only keep the latest 1000
         }} );

主要整理自：

One-to-N基本方法
One-to-N进阶
设计准则
代码实例
One-to-Few
One-to-Many
One-to-Squillions
Two-Way Referencing
Denormalizing Many - One
Denormalizing One - Many
Denormalizing One-To-Squillions
one 侧冗余到Squillions
Squillions侧冗余到one