日期:2014-05-16  浏览次数:20430 次

David Mytton为什么从MySQL迁移到MongoDB数据库

题记:

??? 工作辞了,在家闲着也是闲着,研究了下non-relational数据库,恰巧看到robbin大哥写的“NOSQL数据库探讨”,便迫切想学习下,了解到MongoDB一些基本知识后,就去瞅了下在robbin大哥的文中提及到的一个MongoDB移植案例,如:

“由于Mongo可以支持复杂的数据结构,而且带有强大的数据查询功能,因此非常受到欢迎,很多项目都考虑用MongoDB来替代MySQL来实现不是特别复杂的Web应用,比方说why we migrated from MySQL to MongoDB就是一个真实的从MySQL迁移到MongoDB的案例,由于数据量实在太大,所以迁移到了Mongo上面,数据查询的速度得到了非常显著的提升。”

??? 从中感到了作者的欢喜和忧愁,有翻译不妥或理解不到位的,还请指正:)

?

1,David为什么要迁移?

?

原文如下:

写道
The problem we encountered was administrative. We wanted to scale using replication but found that MySQL had a hard time keeping up, especially with the initial sync. As such, backups became an issue, but we solved that. However, scaling MySQL onto multiple clustered servers as we plan to do in the future is difficult. You either do this through replication but that is only really suited to read-heavy applications; or using MySQL cluster. The cluster looks very good but I have read about some problems with it and was unsure of it’s suitability for our needs.

?

看上去大概的意思是说:我们遇到了管理上的麻烦,虽然我们解决了备份问题。我们试图通过MySql集群解决,集群看上去很好但对于一个大量写应用来说却遇到了困难,同时我们也不确定集群是否适应我们的需求。

?

于是David选择更换MySQL,选择了MongoDB。

?

2、为什么选择MongonDB?

?

写道
Very easy to install.
PHP module available.
Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.
Automated sharding being developed.
Good documentation.

?我想最重要的一点应该是:Very easy replication, including master-master support. In testing this caught up with our live DB very quickly and stayed in sync without difficulty.

?

?非常容易的数据拷贝并且快速、一致。

?

3、移植MongonDB后的问题。

?

Schema-less:

写道
Schema-less

This means things are much more flexible for future structure changes but it also means that every row records the field names. We had relatively long, descriptive names in MySQL such as timeAdded or valueCached. For a small number of rows, this extra storage only amounts to a few bytes per row, but when you have 10 million rows, each with maybe 100 bytes of field names, then you quickly eat up disk space unnecessarily. 100 * 10,000,000 = ~900MB just for field names!

We cut down the names to 2-3 characters. This is a little more confusing in the code but the disk storage savings are worth it. And if you use sensible names then it isn’t that bad e.g. timeAdded -> tA. A reduction to about 15 bytes per row at 10,000,000 rows means ~140MB for field names – a massive saving.


?

灵活的BSON文本存储结构意味着每条记录都带有了字段名,从而处理不当会导致空间的浪费,于是David减缩了字段名。

?

The database-per-customer method doesn’t work

写道
The database-per-customer method doesn’t work

MongoDB stores data in flat files using their own binary storage objects. This means that data storage is very compact and efficient, perfect for high data volumes. However, it allocates a set of files per database and pre-allocates those files on the filesystem for speed:

This was a problem because MongoDB was frequently pre-allocating in advance when the data would almost never need to “flow” into another file, or only a tiny amount of another file. This is particularly the case with free accounts where we clear out data after a month. Such pre-allocation caused large amounts of disk space to be used up.

We therefore changed our data structure so that we had a single DB, thus making the most efficient use of the available storage. There is no performance hit f