磨刀不误砍柴工: 2009

Tuesday, December 29, 2009

go语言的声明

看go主页上的介绍时，有点给我还不解的做法：就是变量声明时的语法。

golang的声明语言如下：

keywork var_name type = value

1. keywork包括：var, const, type, func
2. var_name为变量名
3. type为变量的类型（与keywork中的type有区别）

看过的语言一就是没有类型，一就是把类型放在变量名前面：表示这个变量是什么类型。

感觉很自然。但是golang就反这个习惯。把类型的位置反过来了。

在golang的手册上给出了两个原因：

原因一: Also functions read better and are consistent with other declarations.

说是与函数形式一致

golang中的函数的定义方式：
func sum(a, b int) int { return a + b }

golang中的变量的定义方式：
var p, q *int

大概的意思是：
keywork var_name type
函数的声明关键字 函数名返回值类型

原因二: short declaration

golang中的变量声明有两种缩写方式：
var var_name = value
* 放在后面就方便缩写, 成为没有类型的写法?
* 不过看上去真的比较像javascript的声明

这时没有了类型，类型为根据value的类型决定

还有一种缩写：
var_name := value
* 这就像python的声明了

顺眼了~~~~

想了一下, 感觉还是不错的. 当我们想声明一个变量时:
1. 想到var ( 想到一个keywork, 因为它是语言的一部分 )
2. 想到变量的名称var_name
3. 想到变量的初始化. (1)这时的初始化golang是提供缩写方式的. 这就与平时的脚本一样. (2)当不想这时初始化时, 再定下这个变量的类型(毕竟golang是强类型语言)

也是很好的一种使用感觉.( 个人的感觉~~~~ )

BTW:: 其实上, golang在你没有初始化的情况下它会帮你初始化. 称为'zero value', 这个值会根据变量的类型变化. 如下:
The zero value depends on the type: integer 0, floating point 0.0, false, empty string, nil pointer, zeroed struct, etc.

Monday, December 28, 2009

安装golang玩玩

make软件只有两种結果：
1. 通过
2. 不通过

-_- ! 等于没有说～～。其实我想说的是：
1. 通过：表示程序的源代码没有问题的情况。与用户的环境有没有配置好没有关系
2. 不通过：表示程序的源代码本身就有问题。

之前下载chromium下来make就有这样的问题：人家还在开发，刚好check out了一份有问题的版本下来make。那是浪费时间。
* 就算是ubuntu的daily build都不是每天都make得过的。

今天我取得的是changeset:   4476的代码。可以make得过。
顺便看看那个牛B人最新的提交：

changeset:   4398:683ed10f7832
user:        Ken Thompson <ken@golang.org>
date:        Sat Dec 12 14:36:52 2009 -0800
summary:     more on the optimizer

golang的主页上其实就写得很清楚了。只是有个容易出问题的地方： enviroment variable一节。

请老实把提到的每个环境变量都export一次。如果$GOBIN不是在PATH中的话，请：
export PATH=${PATH}:${GOBIN}/bin

其实只是这句需要注意的。

C中检测头文件的存在

C中没有像python的try語句, 当需要引入系统的函数时会有个问题, 例子:

1. 使引入外部函数时更加友好

try:
    import lxml
except:
    print "no exist lxml module"
    sys.exit()

2. 根据情况使用不同的函数

try:
   import lxml
except:
    try:
   import xml
    except:
       print "no exist lxml and xml module
       sys.exit()

在看同事的代码分析后, 知道了C是怎么处理这种问题: 使用preprocessor处理:

# if ! defined _SYS_TYPES_H
you must include <sys/types.h> before including this file
# endif

这将会产生错误, 不被编译. 这就把问题放在一个宏中. 那么这个宏是怎么获得的呢?

这个问题抛给了GNU autoconf. 在autoconf产生的configure文件被运行后, 会产生一个叫config.h的头文件, 里面将会有系统所有的头文件的宏.

如果C程序需要安全地使用外部函数. 这种机制无疑可以使编译过程更加友好.

Sunday, December 27, 2009

locale

python这部分与libc的函数相似.

setlocale函数有两种用法:
1. 取得指定项的值: 第二个参数为None(Null)

In [5]: locale.setlocale(locale.LC_CTYPE,None)
Out[5]: 'en_US.UTF-8'

2. 设置locale
默认的locale都是C.
设置为指定的locale
In [6]: locale.setlocale(locale.LC_CTYPE, 'en_GB.UTF-8')
Out[6]: 'en_GB.UTF-8'

当setlocale的第二个参数为空字符串时, 使用父进程的locale
In [2]: locale.setlocale(locale.LC_CTYPE, None)
Out[2]: 'C'

In [3]: locale.setlocale(locale.LC_CTYPE, '')
Out[3]: 'en_US.UTF-8'

与locale有关的几个环境变量
LC_*
LC_ALL
LANG

当LANG被定义, LC_*又没有定义时, LC_*都使用LANG的值, 当LC_*单独定义时, 会覆盖LANG的值.
当LANG没有被定义时, LC_*都为"POSIX"
当LC_ALL被定义时, LC_*都被强制使用LC_ALL的值, 这是最高的策略.

还有一个GNU里定义的环境变量: LANGUAGE 它的作用与LC_MESSAGE的作用一致.

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html#tag_08_02

gcc的预处理选项

C的预处理选项在代码里很常见. 对控制生成的目标代码有控制作用. 如:
int main()
{
     do_something();

#ifdef __DEBUG__
     printf("__DEBUG__ DEFINED\n");
     printf("%d\n",A);
#endif

    return 0;
}

可以通过__DEBUG__去控制是否加入测试代码. 现在就有个问题: 怎么在编译时改变这一个值? 如:
#if __DEBUG__ == 1

难道要去修改源文件? 显然这是不对的做法. gcc是有参数去处理这些东西:
-D name=definition

-U name

下面做做测试:
jessinio@jessinio-laptop:/tmp$ cat test.c

void *main(int argc, char *argv[]){

#ifdef __DEBUG__
    printf("%s\n", "debug info");
#endif

}

jessinio@jessinio-laptop:/tmp$ gcc -E test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"

void *main(int argc, char *argv[]){

}

可以看出, 因为没有那个__DEBUG__ 宏, 所以printf没有被加入代码中. 如下就可以:
jessinio@jessinio-laptop:/tmp$ gcc -E -D__DEBUG__ test.c
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "test.c"

void *main(int argc, char *argv[]){

    printf("%s\n", "debug info");

}

不过这有个问题, 就是每次都是在gcc命令中这样的参数不是很难看? 环境是环境变量? 找个Makefile看一下就知道了
看到这句:
# Compiler options
OPT=        -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

很明显, 是使用变量的方式. gcc怎么知道呢?
又看到这句话:
$(CC) $(OPT) $(LDFLAGS) $(PGENOBJS) $(LIBS) -o $(PGEN)

很明显, 就是在命令行中指定~~~~ (-_-)!

python默认对待unicode的方式

我不是故意去找问题的。但是我很不喜欢问题不请自来。谁会愿意把青春花在这个鸟问题上面.

不过话也说回来, 我又不玩游戏, 不下棋. 解决这个问题就当是一场“抓迷藏游戏”吧。

这里, 主要涉及到两个编码问题:
1. 文件系统使用编码方式. 这个值由 sys.getfilesystemencoding() 取得
2. python的unicode函数使用的默认解码方式. 这个值由 sys.getdefaultencoding() 取得.

世界的编码是非常之烦的一类事, 一看到locale -m的输出結果我就没有胃口了. 还是集中解决UTF-8, unicode, ascii之间的问题就够用了

locale的重要性

可以说, locale对程序的行为影响是很大的. linux下的libc提供了机制方便处理这种问题. 举个例子:
jessinio@jessinio-laptop:/$ export LC_ALL='POSIX'
jessinio@jessinio-laptop:/$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=POSIX
jessinio@jessinio-laptop:/$ bash

这时的bash环境是无法使用中文的. 就算你copy过去它也不买单, locale影响到程序对字节流的处理方式(这水深, 主要是于C函数上, 这里先不钻进去).

事件的源由

前段时间把时区的问题搞清楚了。今夜也跑来一个i18n问题。只能发挥宅男的本色。碰个杀一个。

这是一句很平常的語句： os.path.exists( path ), 在哪里使用都很正常。但是在mod_wsgi中使用就狗日的有问题：

File "/home/jessinio/data/workspace/project/home/views.py" in index
32.     os.path.exists(path)
File "/usr/lib/python2.6/genericpath.py" in exists
18.         st = os.stat(path)

Exception Type: UnicodeEncodeError at /
Exception Value: ('ascii', u'/tmp/\u6881\u5e86\u559c', 5, 8, 'ordinal not in range(128)')

os.stat出问题。为什么在一些地方python解释器可以解码，但是在mod_wsgi中又无法解码？

开始关注于C语言的i18n的处理方式。环境变量则是问题的入手点. 下面看一个证据：
python文件内容：
jessinio@jessinio-laptop:~$ cat /tmp/en.py
# coding: utf-8
import os

s = u'/tmp/梁庆喜'
os.path.exists(s)

# 下面证明了LANG环境变量的作用:
jessinio@jessinio-laptop:~$ env|grep LANG
LANG=en_US.UTF-8
GDM_LANG=en_US.UTF-8
jessinio@jessinio-laptop:~$ python /tmp/en.py

jessinio@jessinio-laptop:~$ export LANG=zh_CN.GBK
jessinio@jessinio-laptop:~$ python /tmp/en.py
Traceback (most recent call last):
File "/tmp/en.py", line 6, in <module>
    os.path.exists(s)
File "/usr/lib/python2.6/genericpath.py", line 18, in exists
    st = os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-7: ordinal not in range(128)

先看看os.stat到底做了什么, 在Modules/posixmodule.c文件里的posix_do_stat函数这样写:

    if (!PyArg_ParseTuple(args, format,
                          Py_FileSystemDefaultEncoding, &path))
        return NULL;
    pathfree = path;

    Py_BEGIN_ALLOW_THREADS
    res = (*statfunc)(path, &st);
    Py_END_ALLOW_THREADS

那么Py_FileSystemDefaultEncoding哪里来? 下面的内容来自Python/pythonrun.c. 这个文件是python启动时使用的. 这里有设置了Py_FileSystemDefaultEncoding

#if defined(Py_USING_UNICODE) && defined(HAVE_LANGINFO_H) && defined(CODESET)
    /* On Unix, set the file system encoding according to the
       user's preference, if the CODESET names a well-known
       Python codec, and Py_FileSystemDefaultEncoding isn't
       initialized by other means. Also set the encoding of
       stdin and stdout if these are terminals, unless overridden. */

    if (!overridden || !Py_FileSystemDefaultEncoding) {
        saved_locale = strdup(setlocale(LC_CTYPE, NULL));
        setlocale(LC_CTYPE, "");
        loc_codeset = nl_langinfo(CODESET);
        if (loc_codeset && *loc_codeset) {
            PyObject *enc = PyCodec_Encoder(loc_codeset);
            if (enc) {
                loc_codeset = strdup(loc_codeset);
                Py_DECREF(enc);
            } else {
                loc_codeset = NULL;
                PyErr_Clear();
            }
        } else
            loc_codeset = NULL;
        setlocale(LC_CTYPE, saved_locale);
        free(saved_locale);

        if (!overridden) {
            codeset = icodeset = loc_codeset;
            free_codeset = 1;
        }

        /* Initialize Py_FileSystemDefaultEncoding from
           locale even if PYTHONIOENCODING is set. */
        if (!Py_FileSystemDefaultEncoding) {
            Py_FileSystemDefaultEncoding = loc_codeset;
            if (!overridden)
                free_codeset = 0;
        }
    }

Py_FileSystemDefaultEncoding 的值在python环境下也是可以取得的: sys.getfilesystemencoding()

不过, 没有set函数, python里也没有C代码提供了修改的方法. 也就是说: python启动后这个值是被固定.( 有点郁闷~~~ )

python提供了一个叫locale的module, 类似C的locale处理函数(其实就是C的locale函数封装), 但是:
* 这个库无法修改Py_FileSystemDefaultEncoding. 不要希望在在启动python后通过这个库的函数修改Py_FileSystemDefaultEncoding
* 也就是说, locale无法修改python对待file system encoding的处理方法.

BTW:: 本人试图在python启动后修改这个值做了N个努力, 我日~~~~

file system encoding的作用

文件系统里存在的是文字的交换码. 比如一个文件的路径在文件系统内是utf-8方式存放的. 如:
In [34]: os.listdir('/tmp')
Out[34]:
['\xe6\xa2\x81\xe5\xba\x86\xe5\x96\x9c',]

当试图在python里使用一个unicode的字符串对象去对应文件系统里的资源时, python就会使用file system encoding的方式去编码, 如:
In [41]: a = unicode('/tmp/梁庆喜', 'utf-8')
In [42]: a
Out[42]: u'/tmp/\u6881\u5e86\u559c'
In [43]: os.path.exists(a)
Out[43]: True

python对待unicode的方法和下面的方式一致:
In [36]: a = '/tmp/梁庆喜'
In [37]: a
Out[37]: '/tmp/\xe6\xa2\x81\xe5\xba\x86\xe5\x96\x9c'
In [38]: os.path.exists(a)
Out[38]: True

如果os.path.exist的参数是unicode的话, 它将会使用file system encoding的方式去对unicode编码. 然后使用系统的API.

default encoding的作用

下面使用一个例子就可以看到这个问题:
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> a = '梁庆喜'
>>> unicode(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

unicode函数在不提供第二个参数时, 就要使用default encoding的解码方式.

平时启动python的方式python会把setdefaultencoding方法从sys中删除.

我们可以重载这个编码方式: 使用python的-S参数启动python:
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.setdefaultencoding('utf-8')
>>> sys.getdefaultencoding()
'utf-8'
>>> a = '梁庆喜'
>>> unicode(a)
u'\u6881\u5e86\u559c'

如果不想启动时使用-S参数, 也可以修改/usr/lib/python2.6/site.py里的setencoding函数

Thursday, December 24, 2009

mod_wsgi发布django

web framework的分布, 一般就用两种方式:
1. embed
2. daemon

其实这两种方式都有自己的特点. 只是本人喜欢使用embed方式分布. 因为有http server在把关. 藕合性好点。

python的web分布, 无疑WSGI比较正统. 都成了PEP文档了.

apache + mod_wsgi支持framework的embed方式. 并且配置简单.

安装mod_wsgi后, 加载module:

LoadModule wsgi_module /usr/lib/apache2/modules/mod_wsgi.so

配置站点
<VirtualHost *:80>
        ServerAdmin jessinio@gmail.com

        WSGIScriptAlias / /home/jessinio/data/workspace/project/django.wsgi

        <Directory /home/jessinio/data/workspace/project/>
        Order allow,deny
        Allow from all
        </Directory>
</VirtualHost>

django.wsgi文件的内容:

#!/usr/bin/env python

import os
import sys
# mod_wsgi对stdout的写做了限制. 如果代码里直接使用了print这样对stdout操作的語句的话, 可以使用下面修改stdout的方法.
# 这样, 代码里对stdout的操作都会成为stderr, 而stderr的内容在apache的error日志文件内
sys.modules['sys'].stdout = sys.modules['sys'].stderr

# 不希望使用绝对路径的硬编码方式, 可以使用下面的__file__变量这种好方法
# 此文件放在project目录下
file_path = os.path.dirname(__file__)
parent_path = os.path.dirname(file_path)

sys.path.append(file_path)
sys.path.append(parent_path)

os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'

import django.core.handlers.wsgi
# WSGI的核心工作函数
application = django.core.handlers.wsgi.WSGIHandler()

OK, 很方便的收工了.

Wednesday, December 23, 2009

grub的stage1.5

个人电脑使用LVM过程碰到问题, 体现出对grub的不理解. 也体现出对硬盘结构的不理解

stage1.5这个阶段一般不怎么提的. 这往往是事情最重要的一个环节. 每个stage的作用:
1. stage1, 就是在所谓的MBR空间里, 最大长度为446bytes. 这些代码可以知道分区的始与终, 知道下一个stage的物理位置. 会跳到那里去
2. stage2, 在文件系统上去. 它拥有读懂文件系统的能力. 主要任务: 加载kernel

这里就存在一个大问题: stage1是不懂得文件系统的(只有长度为446bytes就可以懂到文件系统的代码就是神奇的代码), 它怎么把CPU交给stage2呢?

这里就需要stage1.5的帮助. stage1.5是懂得文件系统. 也就是它可以把CPU交给stage2.

这个过程的关键一点就是stage1.5. 但是: stage1.5是存放在哪里的呢?

先对硬盘结构来一次回顾

fdisk的参数看起：

jessinio@jessinio-laptop:~$ sudo fdisk -l /dev/sda

Disk /dev/sda: 250.1 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00038329

   Device Boot      Start         End      Blocks   Id System
/dev/sda1   *           1         100      803218+ 83 Linux
/dev/sda2             101       30401   243392782+ 8e Linux LVM

几个名词：head, sector, cylinder, block

这几个概念当与硬盘结构对应起来时，就有点点问题。特别是head这个概念。现在的硬盘哪里有255个head?这是不可能的！你看过这么厚的硬盘吗？

这里也没有255个head。255个是多少呢？看：

所以！head数目是与物理磁头的数目不对应的。它有什么用处呢？

硬盘数据的寻址方式

早期的硬盘是可以按照数据存放的几何位置寻址，三个数据简称：CHS
后期硬盘发展明显不是这样的。但是为了兼容，三个概念还是保留下来。并映射成一种新的寻址方式：LBA
每一块可以被寻址的位置被称为：block
一般每一块block的大小都是512bytes

保留地址

从CHS到LBA的转算公式：

C, H and S are the cylinder number, the head number, and the sector number
LBA is the logical block address
HPC is the number of heads per cylinder
SPT is the number of sectors per track
* 来自http://en.wikipedia.org/wiki/Logical_block_addressing

一般, HPC等于H, SPT等于S

可以看出, S是从1开始的(不可能为0的). 也就是说: 0 sector没有被使用! 这样就有 sector * 512 bytes的空间是没有划分到分区中!
其中, head为0, sector为0时, 叫MBR.

还有( sector - 1 ) * 512的空间做什么呢?

刚好, 被用来放stage1.5!

因为保留的空间的大小也有限的. 也就是说: 不能把常用的文件系统驱动都带有. 实现方法是:
* 每种boot分区文件系统有指定的stage1.5文件!

例如: e2fs_stage1_5, fat_stage1_5, ffs_stage1_5, jfs_stage1_5,
minix_stage1_5,
reiserfs_stage1_5, vstafs_stage1_5, xfs_stage1_5

不同的文件系统有不同的stage1.5文件.

Tuesday, December 22, 2009

mount的权限

平时就直接使用
# mount /dev/sda1 /mnt/sda1
mount分区.

今天的确有些不爽: mount入的分区user没有permission. 这方面好像完全会了. 但又有与自己想法冲突的现象. 只能重新清理一次

/etc/fstab中的options项: 一部分选项在mount(包括mount.*命令)手册中, 另一部分在fstab手册中, 如noauto, user, owner选项

这里, 涉及到几个问题:
1. 谁能mount
2. mount进来时,文件系统中已存在的文件的owner:group怎么处理
3. mount进来时, mount point的权限怎么处理

谁能mount

这个问题首先要解决的是mount命令:
jessinio@jessinio-laptop:/dev$ ls -l $(which mount)
-rwsr-xr-x 1 root root 72188 2009-10-23 05:54 /bin/mount

mount命令有set-uid位, 有足够的能力处理谁能mount的问题. 需要做到这点, 只是配置文件上的问题:
* 在/etc/fstab的某条记录的options项中, 加入user和owner可以解决. ( 但无法指定具体到那个uid可以处理，只能通过owner和group)

常用的有user, owner, group

虽然一般可以mount分区了，但是还有一个细节的问题：调用mount的方式，如下例子
jessinio@jessinio-laptop:/mnt$ mount /dev/mapper/sg250-data /mnt/usb/ #这样是不通过/etc/fstab配置项
mount: only root can do that
jessinio@jessinio-laptop:/mnt$ mount /mnt/usb/ #这样mount是通过/etc/fstab配置项来处理的，所以一般用户可以mount

* 这看上去是细节问题, 但有时会很郁闷的. 特别root用户使用了mount /mountpoint的方式加载分区, 并且/etc/fstab中有usr, owner, group, users之类的配置, root的操作也受得/etc/fstab中的限制

文件系统中存在的owner:group怎么处理

事实上, ext文件系统中, 是有owner id和group id的信息的. mount入来时, 使用存在的owner id 和group id

这就有种常见的现象: 硬盘A上的文件是在A机上时被user1:group1用户创建的, 当硬盘A被加载到B机时, ls出来的owner:group可能成了user2:group2.
* 这是因为文件的owner id和group id不变, 但是/etc/passwd和/etc/group对应的字符串为user2:group2的原因.

至于root的id就有点特殊. 因为root用户的文件有一个安全问题: set-uid. 所以两种处理:
1. set-uid在mount需要指定set-uid是否生效.默认时不生效
2. 是否把0:0这对id映射成其它的值.

其它没有权限位的文件系统: fat32

这个文件系统使用比广泛. 它是没有owner和group的信息的. 在mount时不指定uid和gid时, 默认为mount的用户: root用户mount就是root的文件, user用户mount就是user的文件.

BTW: mount的手册虽然很长, 但是它是分段的. 不同的文件系统单独的段. 所以特定的文件系统的手册信息其实不长.

重载文件系统本身的权限位

权限位就是文件的mode部分的数据。使用到的参数有：suid, nosuid, exec, noexec

四个参数的默认值:

1. owner, 默认情况下nosuid, nodev
* 文件系统的owner可以mount，也可以umount
2. group, 默认情况下nosuid, nodev
* 用户在文件系统的group中时，可以mount, 也可以umount
3. user, 默认情况下noexec, nosuid, nodev
* 谁都可以mount，但是只能mount的用户才能umount (root不受控制)
4. users, 默认情况下noexec, nosuid, nodev
* 谁都可以mount和umount

* 注意：对于x权限，不是看到的等于真实的，下面有例子

一般用户mount时，在不使用exec参数时， x权限被去掉

下面是例子：
jessinio@jessinio-laptop:/mnt$ mount /mnt/usb/
jessinio@jessinio-laptop:/mnt$ ls -l /mnt/usb/run
-rwxr-xr-x 1 root root 0 2009-12-22 19:28 /mnt/usb/run
jessinio@jessinio-laptop:/mnt$ /mnt/usb/run
-bash: /mnt/usb/run: Permission denied
* 一般用户mount后，就算有x权限也无法运行。
* 如果root用户使用了mount /mountpoint的方式加载分区, 并且/etc/fstab中有usr, owner, group, users之类的配置, root的操作也受得/etc/fstab中的限制.

如果root用户使用mount /dev_file /mountpoint方式加载文件系统时, 默认情况:

1. nosuid

2. exec

3. dev

mount point的权限怎么处理

mount point位置上就是分区的/

mount后, 这又如果处理? 有真相:
jessinio@jessinio-laptop:/dev$ sudo chown jessinio:jessinio /mnt/usb
jessinio@jessinio-laptop:/dev$ ls -ld /mnt/usb/
drwxr-xr-x 2 jessinio jessinio 4096 2009-12-22 13:31 /mnt/usb/
jessinio@jessinio-laptop:/dev$ sudo mount /dev/sg250/data /mnt/usb/
jessinio@jessinio-laptop:/dev$ ls -ld /mnt/usb/
drwxr-xr-x 3 root root 4096 2009-12-22 15:26 /mnt/usb/

分区的/不会因为mount point的权限而受影响. 做多一次测试就可以发现问题的原因:

jessinio@jessinio-laptop:/dev$ sudo chown jessinio:jessinio /mnt/usb
jessinio@jessinio-laptop:/dev$ ls -ld /mnt/usb
drwxr-xr-x 3 jessinio jessinio 4096 2009-12-22 15:26 /mnt/usb
jessinio@jessinio-laptop:/dev$ sudo umount /mnt/usb/
jessinio@jessinio-laptop:/dev$ sudo mount /dev/sg250/data /mnt/usb/
jessinio@jessinio-laptop:/dev$ ls -ld /mnt/usb/
drwxr-xr-x 3 jessinio jessinio 4096 2009-12-22 15:26 /mnt/usb/

* mount入来时, mount point的权限被分区的/取代, 分区的 / 使用存在的owner id 和group id

新硬盘使用LVM

新硬盘终于到手。硬盘其实不大，才250G. 但是对于80G都用不满的本人，还是觉得很多的（人心怎么会满足，80G用不满是因为没有存放的必要罢了）

为了不为硬盘空间的分配折腾（比如root要多大才最好， /要多大才最优，有这个必要吗？不就是空间嘛。），上lvm

lvm分三层：
1. PV层，对应于硬件，如/dev/sdb。 PV也可以是硬盘的一个分区，如/dev/sda1
2. VG层，使用多个pv创建一个组（也可以称为池），把N个硬件抽象成一个连续的大硬件。
3. LV层，对应于分区这个概念。但不是传统的分区概念。

还有一个概念：PE，一个PV(硬件)被处理成N个PE，每单位个PE其实就是chunk of data，数据的一部分。这样数据就可能分配在不是同一个硬盘上的PE。从而实现了数据分配在不同的硬盘上。加上pv, vg, lv等等抽象，PE对用户和程序是透明的，实现了抽象的大硬盘。

刚使用LVM时，常有个问题：为什么可以在一个分区上虚拟另一个分区的？ linux下的设备就是个"玄"的概念

平时使用传统分区时，都是这样使用的：
# mount /dev/sda1 /mnt/usb

但是使用LVM后，不是直接对传统分区操作，而是对LV：
# mount /dev/mapper/sg250-data /mnt/usb

好比使用 mount.davfs 去使用webdav一样。设备的来源成了URL

使用LVM是需要kernel的支持：
jessinio@jessinio-laptop:/dev$ cat /proc/devices |grep map
252 device-mapper

需要mount LVM分区时，需要lvm工具的支持，也就是说需要把boot分区创建在LVM上时，需要在initrd.img里包含lvm工具才行

lvm操作简单，细节慢慢吸收

先把数据copy到新空间再说。

BTW：带来空间管理方便的同时，也带来了新的数据安全问题。

使用得好就是效率，使用得不好就是慢性自杀

Wednesday, December 16, 2009

static文件的权限控制

This summary is not available. Please click here to view the post.

Tuesday, December 15, 2009

交互和非交互bash shell的环境变量

关键字： non-interactive, batch

不得不说：这很容易让人搞乱。

对batch方式的bash的环境变量不是很清楚. 这不方便日常使用, 比如cron定时运行的shell就是属于batch型shell. 这时的shell脚本变量是从哪里得到的呢？下面解开

shell的调用方式

shell的状态有两种：
1. login shell 与non login shell
2. interactive 与 non interactive shell

提到login，很容易想到console登陆界面。其实，此login非彼login。那个叫getty, 如：
jessinio@jessinio-laptop:/tmp$ ps auxwww|grep tty
root       990 1.9 1.9 41680 29604 tty7     Ss+ Dec11 115:06 /usr/bin/X :0 -br -verbose -auth /var/run/gdm/auth-for-gdm-5hJc8s/database -nolisten tcp vt7
root      1204 0.0 0.0   1700   264 tty4     Ss+ Dec11   0:00 /sbin/getty -8 38400 tty4
root      1206 0.0 0.0   1700   264 tty5     Ss+ Dec11   0:00 /sbin/getty -8 38400 tty5
root      1211 0.0 0.0   1700   264 tty2     Ss+ Dec11   0:00 /sbin/getty -8 38400 tty2
root      1212 0.0 0.0   1700   264 tty3     Ss+ Dec11   0:00 /sbin/getty -8 38400 tty3
root      1215 0.0 0.0   1700   264 tty6     Ss+ Dec11   0:00 /sbin/getty -8 38400 tty6
root      1851 0.0 0.0   1700   264 tty1     Ss+ Dec11   0:00 /sbin/getty -8 38400 tty1

* 明显是/sbin/getty。

shell的调用方式组合共了四种:
1 . login , interactive
2. login, non-interactive
3. no login, interactive
4. no login, non-interactive

login的shell, interactive的shell

有这种选项的shell主要作用: 启动时会运行/etc/profile, ~/.bash_profile, ~/.bash_login, ~/.profile. 退出时会运行~/.bash_logout
注意:
1. x(执行)权限对于这些列出的这种脚本来说不是必需的
2. 可以使用--noprofile 去掉这种行为

启动方式: bash --login -i (-i可以省略)

nologin并且是interactive的shell

这种运行方法的shell主要作用: 启动时会运行/etc/bash.bashrc, ~/.bashrc
注意:
1 可以使用--norc去掉这种行为
2. 使用--rcfile强制只运行指定脚本

启动方式: bash -i

non-interative, no login的shell

这种状态的shell只运行BASH_ENV环境变量指向的脚本

启动方式:
1. bash -c 'shell code'
2. bash /path/to/shell/file
3. 有x权限的, 并且有shell bang的

non-interative, login的shell

运行login shell的所有脚本, 再加上BASH_ENV脚本

启动方式:
1. bash --login -c 'shell code'
2. bash --login /path/to/shell/file
3. 有x权限的, 并且有shell bang的, shell bang内加--login参数

一般情况下, 作用脚本运行的bash, 都是没有login的. 这时可以使用BASH_ENV指定一个脚本文件, 如/etc/profile
如果想使用运行者的~下的脚本的, 应该加--login

一般情况下, ~/.bash_profile会调用~/.bashrc. 这使情况更加混.

Sunday, December 13, 2009

C语言的False和True

高级语言使用多了。 C就忘了。回忆一下吧

从基本开始。False and True

In C, the value nonzero is true while zero is taken as false

C的类型主要是：
1. 数(int, float, ...)
2. 字符(char)
3. 指针(arrary, struct, ...)

对于数, 0就是False, 非0就是True

对于字符, 内容为'\0'就是False, 其它为True

对于指针, 0就是False, 其它为True

如一个例子:

int main(int argc, char *argv[]){

   char *address = "\0jessinio";
   if (*address){
       printf("%s\n", ++address);
   }
   printf("%s\n", ++address);
   return 0;
}

if語句内的代码是不被运行的. 因为if (*address)取回了一个char, 这个char的值为'\0', 所以为False

C语言还支持在判断語句中使用赋值表达式的, 如:

int i;
if ( i = 1 + 1){ // do something }

这是等价于 if ( 1+ 1)

这种用法, 在Python里是Error的

这里描述到: http://ftp.at.gnucash.org/languages/c/cref-mleslie/CONCEPT/true_false.html

环境变量

到底什么是环境变量，系统管理上常常会关联上环境变量。平时用就会用。但是没有对它有一个明确的定义。这有碍工作中对它的深刻使用。

带来的疑問：
1. 环境变量中的"变量"是编程语言中的"变量"？
2. 环境变量中的“变量”是存在于kernel中的？

jessinio@jessinio-laptop:/usr/src/linux$ grep -r "putenv" *

从搜索結果可以看出， enviroment不是kernel的东西。

如果环境变量就是普通的变量，为什么需要特殊的函数去操作，如C语言的getenv和putenv

要想知道这一切，应该从libc6代码开始，下载代码后，发现：

char *
getenv (name)
     const char *name;
{
size_t len = strlen (name);
char **ep;
uint16_t name_start;

if (__environ == NULL || name[0] == '\0')
    return NULL;

if (name[1] == '\0')

...(下面被截掉)...

可以看到一个__environ的变量. 难道是global变量? 自己写下面的代码测试一下这个变量是否真会存在:

#include <stdlib.h>
#include <endian.h>
#include <errno.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char* argv[])
    {
        char *env_key = "PATH";
        char *env_value = getenv(env_key);
        printf("get value by getenv function: \n\t%s\n", env_value);
        printf("get value by __environ variable\n");
        int env_len = sizeof(__environ);
        int i;
        for (i =0; i <= env_len; i++)
            {
                printf("\t%s\n", __environ[i]);
            }
        return 0;
    }

OK, 这个变量真的会存在. "环境变量其实就是普通变量"这句话已经有一半可能是正确的了. 下面再看看__environ哪来的

如果没有#include <unistd.h> , gcc出错: get_env.c:15: error: ‘__environ’ undeclared (first use in this function)

OK, 表示这个__environ来源于这个header文件. 找这个header文件:

jessinio@jessinio-laptop:/tmp/eglibc-2.10.1/stdlib$ find /usr/include -name unistd.h
/usr/include/linux/unistd.h
/usr/include/bits/unistd.h
/usr/include/asm-generic/unistd.h
/usr/include/sys/unistd.h
/usr/include/unistd.h
/usr/include/asm/unistd.h

看看/usr/include/unistd.h的内容:

/* NULL-terminated array of "NAME=VALUE" environment variables. */
extern char **__environ;
#ifdef __USE_GNU
extern char **environ;
#endif

__environ为char型二元数组. 说穿了就是一个global变量。下面做一个测试，直接修改__environ变量的值，测试能否传递到子进程中。代码如下：

#include <stdlib.h>
#include <sys/types.h>
#include <endian.h>
#include <errno.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>

int startswith(char *str1, char *str2)
    {
        //like python string instance : 'SHELL=/bin/bash'.startswith("SHELL") is True
        size_t copy_len;
        copy_len = strlen(str2);
        // temporary variable
        // tmp_str[-1] maybe '\0' char
        char tmp_str[ copy_len ];
        strncpy(tmp_str, str1, copy_len);
        // string arrary need '\0' char at the end
        tmp_str[copy_len] = '\0';
        int retval;
        retval = strcmp(str2, tmp_str);
        return retval;
    }

int main(int argc, char* argv[])
    {
        char *env_key = "SHELL";
        char *env_value = getenv(env_key);
        printf("get value by getenv function: \n\t%s\n", env_value);
        printf("set new value by modify __environ variable\n");
        int env_len = sizeof(__environ);
        int i;
        for (i =0; i <= env_len; i++)
            {
                int retval;
                retval = startswith(__environ[i], env_key);
                if ( retval == 0)
                    {
                        // OK! modify the value
                        __environ[i] = "SHELL=/usr/bin/python";
                        //printf("\t%s\n", __environ[i]);

                    }
            }
        // create child process
        pid_t pid;
        pid = fork();
        if (pid == 0)
            {
                //printf("I am child process\n");
                execl("/usr/bin/python", "/usr/bin/python", "/tmp/get_env.py", (char *)0);
            }

        if (pid != 0)
            {
                //printf("I am parent process\n");
                sleep(2);
                waitpid(pid);
            }
        return 0;
    }

get_env.py的代码：
import os
print os.environ['SHELL']

输出結果为：
jessinio@jessinio-laptop:/tmp$ gcc get_env.c
jessinio@jessinio-laptop:/tmp$ ./a.out
get value by getenv function:
    /bin/bash
set new value by modify __environ variable
/usr/bin/python

可见，不通过setenv这种的stdlib函数照样可以修改环境变量并传递到子进程。即：环境变量其实就是普通变量，程序的全局变量

现在变量可以存在了，还有一个问题： fork函数到底到了什么，哪些数据是保留给子进程了. fork函数无疑是kernel的东西，位于kernel/fork.c

下部就下次写。

Saturday, December 12, 2009

时区

这种功能本是清楚的，但是被tzselect的TZ环境变量搞得不清楚了。事实证明是没有清楚。我日～～～

现象归纳

jessinio@jessinio-laptop:/tmp$ date
Sun Dec 13 00:47:11 CST 2009
jessinio@jessinio-laptop:/tmp$ sudo mv /etc/localtime /tmp/
jessinio@jessinio-laptop:/tmp$ date
Sat Dec 12 16:47:23 UTC 2009

可见，与这个文件有关。

jessinio@jessinio-laptop:/tmp$ date
Sun Dec 13 00:52:15 CST 2009
jessinio@jessinio-laptop:/tmp$ cat /etc/timezone
Asia/Shanghai
jessinio@jessinio-laptop:/tmp$ echo 'Australia/South' |sudo tee /etc/timezone
Australia/South
jessinio@jessinio-laptop:/tmp$ date
Sun Dec 13 00:52:35 CST 2009

看来/etc/timezone文件不管用。

jessinio@jessinio-laptop:/tmp$ date
Sun Dec 13 00:54:25 CST 2009
jessinio@jessinio-laptop:/tmp$ TZ='Australia/South'; export TZ
jessinio@jessinio-laptop:/tmp$ date
Sun Dec 13 03:24:44 CST 2009

看来这个环境变量管用。

这是什么意思？

从 http://www.timeanddate.com/worldclock/ 看出， shanghai与San Francisco刚好差16个小时。

jessinio@jessinio-laptop:/tmp$ date
Sun Dec 13 01:21:25 CST 2009
jessinio@jessinio-laptop:/tmp$ TZ='America/Phoenix'; export TZ
jessinio@jessinio-laptop:/tmp$ date
Sat Dec 12 10:21:28 MST 2009

jessinio@jessinio-laptop:/tmp$ sudo cp -r /usr/share/zoneinfo/America/Phoenix /etc/localtime
jessinio@jessinio-laptop:/tmp$ date
Sat Dec 12 10:23:04 MST 2009
jessinio@jessinio-laptop:/tmp$ TZ='Asia/Shanghai'; export TZ
jessinio@jessinio-laptop:/tmp$ date
Sun Dec 13 01:23:16 CST 2009

只使用一个TZ变量就可以完成时区的配置， TZ变量的存在， /etc/localtime没有一点作用

眼见为实

jessinio@jessinio-laptop:/tmp$ unset TZ
jessinio@jessinio-laptop:/tmp$ strace date 2>&1|grep 'open'
...
open("/etc/localtime", O_RDONLY) = 3

TZ变量不存在时， /etc/localtime起作用

jessinio@jessinio-laptop:/tmp$ TZ='Asia/Shanghai'; export TZ
jessinio@jessinio-laptop:/tmp$ strace date 2>&1|grep 'open'
...
open("/usr/share/zoneinfo/Asia/Shanghai", O_RDONLY) = 3

TZ变量存在时， /etc/localtime不起作用

CTIME函数

date命令应该也是使用了ctime函数。随便测试一下C代码的ctime函数

jessinio@jessinio-laptop:/tmp$ cat get_time.c
#include <time.h>
#include <stdio.h>

int main(int argc, char* argv[])
    {
      char * retval;
      time_t epoch = time(NULL);
      retval = ctime( &epoch );
      printf("%s\n", retval);
      return 0;
    }

jessinio@jessinio-laptop:/tmp$ gcc get_time.c
jessinio@jessinio-laptop:/tmp$ strace a.out 2>&1|grep 'open'
jessinio@jessinio-laptop:/tmp$ strace a.out
strace: a.out: command not found
jessinio@jessinio-laptop:/tmp$ strace ./a.out 2>&1|grep 'open'
...
open("/etc/localtime", O_RDONLY)        = 3

jessinio@jessinio-laptop:/tmp$ TZ='Asia/Shanghai'; export TZ
jessinio@jessinio-laptop:/tmp$ strace ./a.out 2>&1|grep 'open'
...
open("/usr/share/zoneinfo/Asia/Shanghai", O_RDONLY) = 3

可见， ctime函数本身就是这样实现的。

就算是python也是一样的。
jessinio@jessinio-laptop:/tmp$ cat get_time.py
#/usr/bin/env python
import time
time.ctime()

进程状态

使用kill -9那是相当的powerful呀, 无一不乖乖投降。

其实现实中，不是全部都可以被kill的。我记得以前在freeBSD下就有进程是kill不了的。今天在看《linux系统管理技术手册》一书也提到。使我对进程的状态引起注意

在ps的手册中有如下状态：
D    Uninterruptible sleep (usually IO)
* 这种状态下, 进程是不处理signal的, 只能由中断唤醒
R    Running or runnable (on run queue)
T    Stopped, either by a job control signal or because it is being traced.
Z    Defunct ("zombie") process, terminated but not reaped by its parent.
S    Interruptible sleep (waiting for an event to complete)

平时的job control使用了STOP和CONT两个信号, 如:

jessinio@jessinio-laptop:/tmp$ python sleep.py
^Z
[1]+ Stopped                 python sleep.py
# 暂停后查看情况:
jessinio@jessinio-laptop:/tmp$ jobs
[1]+ Stopped                 python sleep.py
# 得到PID号:
jessinio@jessinio-laptop:/tmp$ ps auxww|grep sleep.py
jessinio 20100 9.8 0.1   5448 2892 pts/2    T    16:50   0:01 python sleep.py
jessinio 20102 0.0 0.0   3036   788 pts/2    R+   16:50   0:00 grep sleep.py
# 发CONT信号:
jessinio@jessinio-laptop:/tmp$ kill -s CONT 20100
# 后多了一个"&"后台工作方式. 有意思!!
jessinio@jessinio-laptop:/tmp$ jobs
[1]+ Running                 python sleep.py &

这个"&"引起了我的问题: 所谓的backgroup是什么意思?

Linux下, 一切都是文件, backgroup除了session id, group id, parent 外, 无非还有一点: stdin, stdout, stderr.

通过一个简单的例子可以排除shell对自己启动的程序的stdout, stderr没有进行redirect, 例子:
#/usr/bin/env python
import os
import sys

while True:
    print str( os.isatty( sys.stdin.fileno() ) )

在shell下, 使用ctrl+Z 发出Stop信号后, 使用kill -s CONT PID使stop的程序继续运行, 就可以看到: stdout有信息, 但是这时的ctrl-Z无法停止正在行动的脚本.

所谓的后台("&"), 其实是shell这个程序(具体到指bash)的一种概念. 每个interactive shell都会有一张表是记录 jobs in the current session.
shell下的ctrl+Z是bash根据jobs表把发送STOP信号发送到指定进程的.

T与S的区别

S状态的出现, 是表示进程在等待. 这种等待一般是:
0. 等待系统分配CPU时间, 这种sleep状态是系统处理的. 其它程序都无法干涉
1. 调用了system call中会出现, 如IO函数, sleep函数. 这是程序自身调用函数后产生的, 也是其它程序无法干涉的

不存在sleep的信号, 也不存在wake up的信号

Tuesday, December 8, 2009

内核进程

在使用ps命令时间长了，一般都会被这样的信息吸引：
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1 0.0 0.0   2528 1308 ?        Ss   10:31   0:00 /sbin/init
root         2 0.0 0.0      0     0 ?        S<   10:31   0:00 [kthreadd]
root         3 0.0 0.0      0     0 ?        S<   10:31   0:00 [migration/0]
root         4 0.0 0.0      0     0 ?        S<   10:31   0:00 [ksoftirqd/0]
root         5 0.0 0.0      0     0 ?        S<   10:31   0:00 [watchdog/0]
root         6 0.0 0.0      0     0 ?        R<   10:31   0:00 [events/0]

command一栏中，什么有[]的呢？后面的"/0"又是什么意思？那是CPU, 如果有多个CPU, 就会出现有其它数字的情况

这种进程被称为“内核线程”，它们与一般进程的区别是：内核线程不是通过fork系统调用产生的。
这种线程不是完整的进程，而是内核的一部分。为了调度或者结构上的原因而进行这样的装扮，便它们看上去像进程
* 这种线程是存在于用户空间的. 与module不一样.

kjournald: 每个被加载的ext文件系统都有一个与之对应的kjournald进程。（ext4被会是kjournald2， ext2不知道是什么名）

URL字符集

URL就看得多了. 但是没有理会过它的字符集

原本无聊, 突然对google docs发出的东西感兴趣. 本人绝对相信: docs发出去的是文件的差异数据. 也就是不会每次save发出全文件的.

contentIsDelta=true&delta=%3D1850%09%2B%25E6%2588%2591%25E6%2588%2591%25E6%2588%2591%25E6%2588%2591%09%3D4(......)还有很长.

delta这词就是变化, 差异的意思.

看到这数据就发毛: 这是什么编码的数据?

一堆"%"的, 应该与urllib.unqoute产生的数据相似, 那么, URL本身又是如何定义的呢?

看RFC文档嘛, 我最怕的了. 还是看一些blog比较快手点(又不是搞理论的) ，如：
* http://netzreport.googlepages.com/online_tool_for_url_en_decoding.html

这种玩法就: Percent-encoding ( http://en.wikipedia.org/wiki/Percent-encoding )

就是URL只能由A-Z, a-z, 0-9, - _ . ~这些有限的字符组成. 其它都使用"%十六进制"表示

其实我传的只是几个"我"字, 看上面的一串字符串, 是有一段重复的. 所以就用一段来试一下:

In [16]: urllib.unquote('%2588%2591%25E6')
Out[16]: '%88%91%E6'

还是有一个"%"号的. 再看一下"我"字的外码:
In [17]: '我'
Out[17]: '\xe6\x88\x91'

有点像了~~~~ 应该是始点取错了, 应该是: "%25E6%2588%2591", 看试一下:
In [18]: urllib.unquote('%25E6%2588%2591')
Out[18]: '%E6%88%91'

OK, 和'我'字的外码太像了. 只是使用了%表示是十六进制. 把上面一段完整解码:

In [80]: a = urllib.unquote('%3D1850%09%2B%25E6%2588%2591%25E6%2588%2591%25E6%2588%2591%25E6%2588%2591%09%3D4')
In [81]: a = binhex.binascii.a2b_hex(a.split('\t')[1][1:].replace('%', ''))
In [82]: print a
-------> print(a)
我我我我

纯把玩. 没有它意.........

Friday, December 4, 2009

datetime库时间处理

本来这个问题是很简单的. 就是时间的增加. 结果自己钉上了datetime的时间运算.

本来是好事, 但是杯具在后面:
1. date和time实例都无法进行时间运算, 只有datetime.datetime类才有.
2. 实例化datetime需要year, month等参数, 对于只使用datetime.time的运算而使用datetime.datetime有些不靠谱. (心中是这样想的, 想着懒, 想着代码合理)
3. timedelta无法没有strftime和strptime格式化函数

自己钉上了datetime.time对象(觉得datetime.datetime太多没有必要的功能)

杯具就是这样产生了. 把字符串解释出来, 然后实例:
start_time = datetime.timedelta(
                                                hours=hours,
                                                minutes = minutes,
                                                seconds = seconds,
                                                milliseconds = milliseconds
                                                )

太奶奶的杯具了~~~~~~~~~

后面发现, 实例化datetime.datetime, 使用strptime函数从"想像中的"方便. 如:

In [27]: datetime.datetime.strptime('0:00:40', '%H:%M:%S')
Out[27]: datetime.datetime(1900, 1, 1, 0, 0, 40)
* 其中包含的date已经被补上了UTC的开始时间

进行时间运算比想像中的方便:
In [30]: datetime.datetime(1900, 1, 1, 0, 0, 40) + datetime.timedelta(seconds=1)
Out[30]: datetime.datetime(1900, 1, 1, 0, 0, 41)

只是多了一个date部分. (-_-), 为什么就不想开只眼闭只眼呢? 下面只使用time部分:

In [33]: a = datetime.datetime.strptime('0:00:40', '%H:%M:%S')
In [34]: a = a + datetime.timedelta(seconds=1)
In [35]: a.strftime("%H:%M:%S")
Out[35]: '00:00:41'

strptime的格式: '00:00:41' 与 '01:01:41' 的格式是否一样呢?

把时间转换成字符串, 一般都是使用库里的API. 补"0"的和不补的, 哪个是业界标准?

还好, 库的API比我想象的要完美得多:

In [36]: datetime.datetime.strptime('Thu, 08 Jun 2001 04:07:05', '%a, %d %b %Y %H:%M:%S')
Out[36]: datetime.datetime(2001, 6, 8, 4, 7, 5)

In [37]: datetime.datetime.strptime('Thu, 8 Jun 2001 4:7:5', '%a, %d %b %Y %H:%M:%S')
Out[37]: datetime.datetime(2001, 6, 8, 4, 7, 5)

两种都是标准, 呵呵~~~~

Thursday, December 3, 2009

CLI参数分析

看起来getopt比较方便使用, 其实不方便.

optparse.OptionParser实例的add_option(*opt_str, **kwargs[action, dest, default, help, type, nargs])

action:
1. store (默认)
2. store_true
3. store_false

type:
1. string (默认)
2. int

快速使用的话, optparse也可以
import optparse

parser = optparse.OptionParser()
parser.add_option(' -f ')

if __name__ == '__main__':
options, args = parser.parse_args()
print options.f # 被没有传入参数的情况下, options.f为None

不比getopt多多少代码量!

比较好的例子: http://www.alexonlinux.com/pythons-optparse-for-human-beings

Tuesday, December 1, 2009

str, unicode

在python里, 这两种东西比较烦, 比如:

if 'author' == name:
这看上去没有什么大问题, 但是说不定就会出bug!, 最好是这样:
if 'author' == str(name) :

如果判断变量是否为字符串, 平时会写:
if type(name) == str:
最好的方法是:
if issubclass(type(name), basestring):

因为basestring是<type 'str'>和<type 'unicode'>的父类

还有:

full_name = first_name + last_name
看上去也没有什么大问题(使用acsii码时). 但是:
>>> a = '我'
>>> b = u'们'
>>> a + b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

这其实是编码问题:
>>> unicode('我')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

编码应该是要知道的

unicode在python里得到unicode有两种方法:
1. u'xxxxx'
2. unicode('xxxxx')
这根据'xxxxx'的内容又分为:

1. 'xxxxx'为外码

文本文件中的 u'我' 这其实就是把'我'字的外码写到文件内.

这种情况下, python解释器把外码转成unicode是需要一个解码方式 这就是sha-bang line起作用的地方: python会使用sha-bang line指定的编码方式去把字符串转换成unicode对象

这种方法常常和下面的方法产生混淆:

>>> a = u'\xe6\x88\x91'
>>> print a
æ

为什么u'xxxx'中的外码出问题呢?
理论上, '\xe6\x88\x91' 和 '我' 对python解释器来说是一样的. 但是u'\xe6\x88\x91'表示了三个unicode字符! 分别是: u'\xe6, u'\x88, u'\x91'

其实: '\xe6\x88\x91'表达式和 '我' 表达式对python解释器来说是一样的, 但是后者明显比前者占用的硬盘空间少!

2. 'xxxxx'为内码

In [38]: print u'\u53F6'
-------> print(u'\u53F6')
叶

unicode的方式比较直接: unicode(外码, 编码方式)

Sunday, November 29, 2009

使用urllib2抓网页数据

搞这种事, 主要是有时候需要脚本自动化地去处理一些烦锁的事. 至少不用狂copy, paste两个动作.(-_-), 有时也少量还行, 大量就浪费生命了.

HTTP的访问过程就是一来一回的. python提供的urllib2很方便发起访问请求:
* urllib2.urlopen(url)
url为完整的URL
* urllib2.urlopen(request)
request为urllib2.Request类实例

这样就发起了HTTP访问请求.

现在的网站一般都会对自动处理脚本起防范的. 比如在header段的cookie, 还有就是在post请求发出的数据中加入key=value形式的一串字符串.

I. 请求的header段处理
header在python对应的数据结构就是dict, 如:
{'cookie': '111111111111111', 'Accept-Encoding': 'gzip,deflate'}
使用方法:
request = urllib2.Request(url, headers) # headers就是字典实例
retval = urllib2.urlopen(request) # 请求将被发出去

II. post的数据处理
post的数据在python对应的数据结构是str, 如:
'person=jessinio&gender=male'

使用方法有两种:
1.
retval = urllib2.urlopen(url='http://www.google.com', data='person=jessinio&gender=male') #这样一个post请求就被发出去了.

2.
request = urllib2.Request(url, data='person=jessinio&gender=male') #指定request实例拥有的data字符串
retval = urllib2.urlopen(request) # 请求将被发出去

* 只要知道headers和post请求需要的数据结构是对应于python哪种实例后就很容易使用urllib2库

请求发出来, 接来又来一个问题: 请求后返回的数据是什么东西?

全世界都知道返回的东西肯定是字符流~~~(-_-)

常常在平时出现这样的问题: 请求一个html文件, 但返回的不是文本数据. 比如gzip. 那就需要处理一次:
    if retval.headers.has_key('content-encoding'):
        fileobj = StringIO.StringIO()
        fileobj.write(url.read())
        fileobj.seek(0)
        gzip_file = gzip.GzipFile(fileobj=fileobj)
        context = gzip_file.read()
    else:
        context = url.read()

这样就很方便得到文本数据了.

Wednesday, November 18, 2009

method和function在python一样吗?

之前学习了descriptor和decorator之后就感觉到python里的类方法中第一个参数: cls 有淫技的存在, 还没有时间理会它. 当然, 学习有不一定通的情况, 只是到刚刚能感觉到"淫"意的地步.

今天和朋友又聊到了self.

结果他妈的...... self不是keyword!! 我靠.... 还有这种事..... 自欺!! 人类的死点....

就是因为这样, 就把下面的事挖出来了.

自然提出一个问题: method与function是不是同一概念? 如下代码:

In [12]: class M(object):
   ....:     def func_or_method(self,name):
   ....:         print name
   ....:
   ....:
In [13]: M.func_or_method
Out[13]: <unbound method M.func_or_method>

In [14]: M.__dict__['fun_or_method']
Out[14]: <function func_or_method at 0x94e66f4>

同一个事物, 不同的调用方式, 产生了不同的结果. 不得不让人想到"淫技"一词.

这一切都是源于object和type的__getatrribute__. 因为它是一个hook. 偷天换日只有它能做到.

在: http://users.rcn.com/python/download/Descriptor.htm 里 Invoking Descriptors 一节中道出了__getattribute__的黑暗事实: 把对object和type的"."这种属性获取语法全换成了:
1. 对于object : type(b).__dict__['x'].__get__(b, type(b))
2. 对于type : B.__dict__['x'].__get__(None, B)

这部份转换工作是C实现的. 上面的python代码只是模拟代码(Descriptor.htm指出)

明白这些后, 再回到method与function的区别上.
为什么: M.func_or_method 和 M.__dict__['func_or_method'] 会不同呢?
使用dir()函数, 可以看到, M.func_or_method是有__get__属性的. 但是M.__dict__是没有.

也就是说: M.func_or_method已经被调用了func_or_method.__get__ ! 这样, type()打印出的东西已经是__get__返回的

按照上面了解的, 只是object和type调用__get__的参数不同. 并不是因为定义的函数中有self的原因!

总的来说: A.B都是使用A.__dict__字典, 至于调不调用字典返回的value的__get__和传给它的参数是什么由__getattribute__处理

Tuesday, November 17, 2009

看pydevd.py代码时碰到的问题

代码:
            #pretend pydevd is not the main module, and
            #convince the file to be debugged that it was loaded as main

            sys.modules['pydevd'] = sys.modules['__main__']
            sys.modules['pydevd'].__name__ = 'pydevd'

            from imp import new_module
            m = new_module('__main__')
            sys.modules['__main__'] = m
            m.__file__ = file
            globals = m.__dict__

从这段代码中, 看到sys.modules和imp, 还有'__main__'

说白了, 不明白是什么意思!!!

上次学习, 本人知道每个文件(module)都为一个scope, global对是对于module的. module常常会有三个变量:
0. __package__ : 如果是import一个package的话, 这个变量就是包的名, 如果是module则为None
1. __name__ : 就是import时的名字, package时为包名, module则为模块名. 如果是__main__是指被python载入的文件(module)
2. __file__ : 不是每个import的对象都被这个变量的, 比如上面的imp包就没有, 还有就是直接运行 python 也是没有, 猜测: imp是不是py文件, 可以import入是因为它为python解释器内部的. 一些文档是这样写的: The __file__ attribute is not present for C modules that are
statically linked into the interpreter

虽然常用sys, 还还真少理会sys.module, 文档的解释:

modules

This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. Note that removing a module from this dictionary is not the same as calling reload() on the corresponding module object.

看了其实也不是很明白. 只知道是Import入的包. 运行ptyhon(不是ipython):
>>> import sys, pprint
>>> pprint.pprint(sys.modules)
会发现有很多包不是自己import入的.

这个是python进程共import入的包, 已经不是具体某个包的scope了范围, 这种做法的意义: 让module对象可以被相互引用, 因为python进程存在这个变量都是存在的(当然需要import sys)

好了, 再回到最上面的代码, sys.modules['__main__']的含义:

jessinio@niolaptop ~/workspace/python script/src $ python
Python 2.6.2 (r262:71600, Oct 11 2009, 05:51:18)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
>>> print __name__
__main__
>>> import sys
>>> dir(sys.modules['__main__'])
['__builtins__', '__doc__', '__name__', '__package__', 'sys']
>>> print sys.modules['__main__'].__name__
__main__

明显是指python首个载入的文件, 因为每个py文件都是一个module, 所以就在这里出现了.

sys.modules['pydevd'] = sys.modules['__main__']
sys.modules['pydevd'].__name__ = 'pydevd'

sys.modules['pydevd']和sys.modules['__main__']都是首个载入的文件(module)的引用 , 然后把首个载入的module scope内的__name__修改了. 这个__name__只是module内的变量

from imp import new_module
m = new_module('__main__')
sys.modules['__main__'] = m
m.__file__ = file
globals = m.__dict__
使用API创建一个module对象(-_-! 还有这种玩法的! ) , 然后对外宣传为个新创建的module对象是首个载入的module. -_-! 算不算骗人? 不过还算好心, 还是给出了首个载入的module的路径(就是那个file变量)

了解后, 才明白最上面那段代码的commet是什么意思: 就是自己修改__main__防止被别人修改! 牛!!

下面是new_module的玩法:
>>> import imp
>>> a = imp.new_module('fuck')
>>> type(a)
<type 'module'>
>>> a.__name__
'fuck'

在我眼内, python里"_"开始的变量名都是淫技的开端! ( 不过还没, 淫技没有perl 淫, perl淫是那个乱! ), 淫技学少了会郁闷, 学多了又伤身, 只能适量!

下面学学__dict__这招. 提到__dict__就会扯到:
1. __solts__ : 限制__dict__的内容
2. __get__ : descriptor概念中使用的
3. __getattr__ : 老式的属性获取方法
4. __getattribute__ : 新式的属性获取方法
全是下划线!!

__dict__返回的符号首先与dir函数返回的符号的区别:
看dir()的说明:
      for a module object: the module's attributes.
      for a class object: its attributes, and recursively the attributes
        of its bases.
      for any other object: its attributes, its class's attributes, and
        recursively the attributes of its class's base classes.
对于class和instane是有一个递归过程的, 抄祖宗十八代方式的.

>>> class C(object):
...     name = 'class'
...
>>> o = C()
>>> o.__dict__
{}
>>> o.name = 'instance name'
>>> o.__dict__
{'name': 'instance name'}

__dict__ 就是指self指的object自己的属性. 在python里, module, 每层class, instance的namespace就是__dict__
* 相关文档: http://pyref.infogami.com/__dict__

在最上面的代码中, 最后一句就容易理解了: globals = m.__dict__ 就是取得m对象的namespace全部变量, 它被应用于:
        if not IS_PY3K:
            execfile(file, globals, locals) #execute the script
        else:
            #We need to compile before so that the module name is correct
            obj = compile(open(file).read(), file, 'exec')
            exec(obj, globals, locals) #execute the script

当前运行模块(首个被载入的py文件)的CPU时间给了obj 对象使用. 偷吃也要擦嘴就顺便把namespace处理一把

关于在python进程都起作用的淫技应该被称为: Runtime service, 如下URL:
http://docs.python.org/library/python.html

Thursday, November 12, 2009

javascript的var关键字

什么是全局变量?

全局是一个范围, 这个早已经解理, 但是这个范围到底有多大?

就像我们的世界一样: 宇宙外面是什么? 全局包不包括宇宙外面?

只能说: global是相对的.
在python里, 这方面的API:
globals()
    globals() -> dictionary
    Return the dictionary containing the current scope's global variables.
locals()
Update and return a dictionary representing the current local symbol table.

存在run.py文件, 内容如下:
import a.am
a.am.foo()

这里存在四个scope区域:
1. run.py的scope
2. package a的scope
3. module am的scope
4. foo函数的scope

现在的问题是: 不同的scope中使用global声明和修改变量的值后是否会对其它scope产生影响

1. module中的function不会对引用它的scope产生影响 , 只会对module scope产生影响
在run.py文件中调用foo函数, foo函数中:
global name
name = "function set!"
结果发现module am的scope内的name变量变化. 这说明: function中的global声明和修改变量的值会影响到module的情况

2. package中的function不会对引用它的scope产生影响
在run.py中存在语句import a, 其中a/__init__.py的内容:
global name
name = '__init__.py set!'
结果在run.py中的scope没有发现. 这说明: package中的global声明和修改变量的值不会影响到引用它的scope
* 真实情况是: package中的function只是修改了__init__.py的scope. 这与(1)的情况是相同的

python中, 每个*.py都是一个module. package为组织module的一个对象, 说穿了package也为一个module, 例如:
# 主运行文件
jessinio@niolaptop /tmp/test $ cat use.py
import a.am
a.foo()
a.am.foo()
# pacakge的__init__.py文件
jessinio@niolaptop /tmp/test $ cat a/__init__.py
import pprint
def foo():
    print "package funcion"
# module文件
jessinio@niolaptop /tmp/test $ cat a/am.py
def foo():
    print "module am function"
# package这个绝对算是module!
jessinio@niolaptop /tmp/test $ python use.py
package funcion
module am function

在import入package下的module时, package这个"module" 也被import了! 这绝对叫到: 不正当捆绑型消费

每个python文件都会有下面的属性:
'__builtins__'
'__doc__'
'__file__'
'__name__'
'__package__'

package的__init__.py还会多一个属性:
'__path__'

Saturday, November 7, 2009

gentoo QA

刚开始, update world的时间, 也是来了一个QA打字的note. 没有注意, 然后把pidgin使用的eds USE标志去掉了.
因为这个USE需要evolution的. 就是在满足它的时候出了错误, 还正也不想安装evolution. 干脆去了它.

但是make了一小会, 又出了大致的错. 这一下提起注意, 因为错误大致有QA:
* QA Notice: Package has poor programming practices which may compile
* fine but exhibit random runtime failures.
* auevents.c:149: warning: incompatible implicit declaration of built-in function ‘free’
* auevents.c:249: warning: incompatible implicit declaration of built-in function ‘free’
* connection.c:569: warning: incompatible implicit declaration of built-in function ‘free’
* connection.c:592: warning: incompatible implicit declaration of built-in function ‘free’

QA是什么?

G了一下, 得这个URL: http://www.gentoo.org/proj/en/qa/

难度是gentoo对包的要求过高了? 查了一把, 没有查到相关的配置

算了, 代码问题就不理会. 不然这级至少要半年才能升.

我搞不过, 还躲不过? 把这个包去掉再说!

BTW:: 每次升到QT就是烦!! 什么时候把QT也干废!

Wednesday, October 28, 2009

代码与API文档

从代码中直接提出API文档, 是一件令人高兴的事. 自己辛苦写的代码, 终于有了给人看得懂的文档了(虽然代码是ugly的)
python还方面的工具也很多的. epydoc就是送的一个小工具(我也不知道什么时候被安装入系统的).

从代码里取出API文档, 是原于一个叫documentation strings的东西. 在python里, 有这种东西的对象如下:
1. modules
2 functions
3 classes
4 methods

在哪里定义docstrings呢?

An object's docsting is defined by including a string constant as the first
statement in the object's definition.

看来, 第一个有效的语句, 并且是string型的, 就是叫docstring
function, classes, method比较好理解, 下面两个特殊一点的对象:
1. 在module中, 在py文件的除了"#"开始的行外, 第一个字符串语句
2. 在package中, docstring在于__init__.py文件中, 情况与module相同

docstring只是第一步. 需要进一步细分文档内容, 比如, 描述一个function的用途, 参数和返回情况. 这时需要在docstring中使用一种叫fields的东西, 如下URL:
* http://epydoc.sourceforge.net/manual-fields.html

如果想对docstring样式控制的话, 还有此功能的:
* http://epydoc.sourceforge.net/manual-epytext.html

BTW: docstring里样式控制就太花了. 加点缩进还是可以忍受的

Tuesday, October 27, 2009

django的ORM操作

不可否认, django的文档写的很好, 也很多文档.

对于本人还是那句话: 如果使用文字不能把一件事物清楚地描述出来的话, 那么你对这件事物还是不清晰 . (^_^)

ORM类实例对象的创建

1. 使用类的__new__方法

2. 使用类Manager.create方法

两个类:

class Musician(models.Model):
   first_name = models.CharField(max_length=50)
   last_name = models.CharField(max_length=50)
   instrument = models.CharField(max_length=100)

class Album(models.Model):
   artist = models.ForeignKey(Musician)
   name = models.CharField(max_length=100)
   release_date = models.DateField()
   num_stars = models.IntegerField()

创建对象:

1. 使用类的__new__方法:

In [6]: m = Musician(first_name='firename', last_name='lastname', instrument="something")
In [7]: m.id

这时, 新创建的对象是没有id的. 也就是还没有数据库的记录. 还需要使用方法把数据存在到数据库中:

In [8]: m.save()
In [9]: m.id
Out[9]: 2L

2. 使用类的Manager.creater方法:

In [4]: m = Musician.objects.create(first_name='firename', last_name='lastname', instrument="something")
In [5]: m.id
Out[5]: 3L

这时, 数据已经被添加到数据库表, 不用调用save方法

任何model类实例对象都有这两种方法创建.

ManyToOne Field添加引用实例

Album与Mussician的关系是"多对一", Album创建实例时多了一个条件: Album的__new__和Manager.create会检查artist参数是不是Musician的实例:

__new__方法:

In [12]: a = Album(artist=2, name='album name', release_date=datetime.datetime.now(), num_stars=10)

ValueError: Cannot assign "2": "Album.artist" must be a "Musician" instance.

Manager.create方法:

In [13]: a = Album.objects.create(artist=2, name='album name', release_date=datetime.datetime.now(), num_stars=10)
ValueError: Cannot assign "2": "Album.artist" must be a "Musician" instance.

* 传个与primary key相同的数字还是出错

需要在artist参数位置传入一个指定的实例:

步骤1: 使用Musician的__new__或者Manager.create创建Musician对象:

m = Musician(first_name='firename', last_name='lastname', instrument="something")

m.save() # 一定要被存在在数据库中才能被Album的创建方法正常使用, 则否出错.

a = Album(artist=m, name='album name', release_date=datetime.datetime.now(), num_stars=10)

a.save()

如果m不使用save()的情况:

In [27]: m = Musician(first_name='firename', last_name='lastname', instrument="something")
In [28]: a = Album(artist=m, name='album name', release_date=datetime.datetime.now(), num_stars=10)
In [29]: a.save()
IntegrityError: (1048, "Column 'artist_id' cannot be null")

出错. 说artist_id为null

使用对象:

有两个方向: 1. 从Foreign端; 2. 从被Foreign端

1. 从Foreign端

In [19]: a = Album.objects.get(pk=1)
In [20]: a.artist.id
Out[20]: 4L

In [22]: a.artist.first_name
Out[22]: u'firename'
这种引用方法很pythonic

2. 从被Foreign端

这时候被称为: The “other side” of a ForeignKey relation. 如果一个model类被其它类通过foreign key引用的话. 那么, 在被引用

的类和这个类的实例中, 会多一个属性:

In [61]: type Musician.album_set
-------> type(Musician.album_set )
Out[61]: <class 'django.db.models.fields.related.ForeignRelatedObjectsDescriptor'>
In [62]: m = Musician.objects.get(pk=1)
In [63]: type m.album_set
-------> type(m.album_set )
Out[63]: <class 'django.db.models.fields.related.RelatedManager'>
在django文档中有这样一句话:

Following relationships "backward"

If a model has a ForeignKey, instances of the foreign-key model will have
access to a Manager that returns all instances of the first model. By
default, this Manager is named FOO_set, where FOO is the source
model name, lowercased. This Manager returns QuerySets, which can be
filtered and manipulated as described in the "Retrieving objects"

作用就是能通过此方法找到一个对象都被哪些对象引用

# 下面是从album实例找到引用的musician实例

In [64]: a = Album.objects.get(pk=1)
In [65]: a.id
Out[65]: 1L
In [67]: a.artist.id
Out[67]: 4L

# 下面是从musician实例找到被哪个album实例引用了

In [68]: m = Musician.objects.get(pk=4)
In [70]: a_set = m.album_set.all()
In [71]: a_set.count()
Out[71]: 1
In [72]: a = a_set[0]
In [73]: a.id
Out[73]: 1L

蓝色的部分都是一样的, 表示是同一个album实例

ManyToMany Field添加引用实例

class Domain(models.Model):
    name = models.CharField(max_length=128)
    is_singular = models.IntegerField()
    description = models.CharField(max_length=32, default="")
    object = models.ManyToManyField("Object")

class Object(models.Model):
    name = models.CharField(max_length=128)
    description = models.CharField(max_length=32, default="")

ManyToMany字段是可以引用(foreign)多个实例的. 这通过ManyToMany Field的add方法实例:
In [2]: domain = Domain(name = 'china', is_singular=1)
In [3]: domain.object.add
ValueError: 'Domain' instance needs to have a primary key value before a many-to-many relationship can be used.
In [5]: domain.object.add
Out[5]: <bound method ManyRelatedManager.add of <django.db.models.fields.related.ManyRelatedManager object at 0x9fa82ec>>
可以看出, ManyToMany Field的add方法是需要在save后才有的.

下面为ManyToMany Field增加引用实例:
In [6]: obj = Mod.Object(name="object1")
In [7]: domain.object.add(obj)
IntegrityError: (1048, "Column 'object_id' cannot be null")
* 与Foreign Field是一样的, 都是需要被引用的对象先存在于数据库表中!
In [8]: obj.save()
In [9]: domain.object.add(obj)
In [10]: domain.save()
* 调用obj.save后, 才成功添加引用实例!
还是可以引用多个实例的:
In [11]: obj = Object(name="object2")
In [12]: obj.save()
In [13]: domain.object.add(obj)
In [14]: domain.save()
* 这么多个save(), 是因为底层还是关系型数据库, 每一次save就对应着一次insert或者是update

通过through参数, 使用定制义的中间表, 此后, ManyToMany Field没有了add方法!

新的模型如下:
class Domain(models.Model):
    name = models.CharField(max_length=128)
    is_singular = models.IntegerField()
    description = models.CharField(max_length=32, default="")
    object = models.ManyToManyField("Object", through="DomainObject")

class Object(models.Model):
    name = models.CharField(max_length=128)
    description = models.CharField(max_length=32, default="")

class DomainObject(models.Model):
    domain = models.ForeignKey("Domain")
    object = models.ForeignKey("Object")

# 找一找ManyToMany Field的add方法:
In [2]: domain = Mod.Domain(name = 'china', is_singular=1)
In [3]: domain.save()
In [4]: domain.add
AttributeError: 'Domain' object has no attribute 'add'
* 被掩了!
为ManyToMany Field添加引用实例需要使用如下方法:
In [6]: domain
Out[6]: <Domain: Domain object>
In [7]: obj
Out[7]: <Object: Object object>
In [8]: domain_object = Mod.DomainObject.objects.create(domain=domain, object=obj)
* 这里的Manager.create方法可以类的__new__方法
这是显式使用了中间表对应的类来生成映射关系.

使用对象:

和上面的ManyToOne情况一样, 有两个方向: 1. 从Foreign端; 2. 从被Foreign端

1. 从Foreign端
In [23]: domain.object
Out[23]: <django.db.models.fields.related.ManyRelatedManager object at 0x9fa802c>
为一个Manager对象!
In [24]: objs = domain.object.all()
In [27]: for item in objs:
   ....:     print item.id
1
2
可见, 这时间是需要指定select条件才能取回自己想要的对象

2. 从被Foreign端
In [28]: obj = domain.object.all()[0]
In [29]: obj.id
Out[29]: 1L
In [30]: domain.id
Out[30]: 1L
可见, ID为1的domain引用了ID为1的object, 反过来找:
In [35]: obj = Object.objects.get(pk=1)
In [39]: domain = obj.domain_set.get(pk=1)
In [40]: domain.id
Out[40]: 1L

改变RelatedManager的名字

class Musician(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    instrument = models.CharField(max_length=100)

class Album(models.Model):
    artist = models.ForeignKey(Musician, related_name="Musician_set")
    name = models.CharField(max_length=100)
    release_date = models.DateField()
    num_stars = models.IntegerField()

多了红色代码, 然后:
In [7]: m = Musician(first_name='first_name', last_name='last_name', instrument='instrument')
In [8]: m.Musician_set
Out[8]: <django.db.models.fields.related.RelatedManager object at 0x8de220c>

映射与primary key, foreign key

一对多, 多对多, 多对一, 一对一. 这是什么?

这种是什么关系? 在初中数学中有一种叫: 映射的概念.

定义：设A和B是两个非空集合，如果按照某种对应关系f，对于集合A中的任何一个元素，在集合B中都存在唯一的一个元素与之对应，那么，这样的对应（包括集合A，B，以及集合A到集合B的对应关系f）叫做集合A到集合B的映射(Mapping)，记作f：A→B。

这个"对应关系f"又是一个忽悠级的概念. 又如, money与owner就是一种"对应关系". 那么girls与me有没有关系呢? 当然了! 当me为mine时就很明显了!

重要的几句话:
1. 映射，或者射影，在数学及相关的领域经常等同于函数
* f就是这个函数. A的与B只不过是stdout与stdin罢了
2. 对应的唯一性：定义域中的一个元素只能与映射值域中的一个元素对应
* 如图:

图1不是中的集合A到集合B不形成映射! 因为这不是唯一性!

那么为什么在数据库中常出现"一对多"的情况呢?
映射是有方向的, 上面的图(1)如果方向从B到A, 使用f = N^2这个法则的话. 就形成了集合B到集合A的映射. 就是说"一对多"的另一个方向是"多对一"

"多对多"是怎么解释?
从django的ORM实现ManyToMany关系类型就大概知道了:

左边的图是"多对多"的情况, 等价于右边的图. 增加一个中间集合, 关系被分成两个大圆上. 结果是: "一对多 => 多对一".

所以, 映射有二种:
1. 一对一
2. 一对多

primary key 和 foreign key又是何方神圣?

1. primary key就是一个集合中, 为每个元素分配一个唯一的序号. f(集合, pk)一定可以得到唯一的元素

2. foreign key就是当某一集合的元素的primary key序号在另一集合中作为元素值时对这个序号(数字)的称呼. 与上面一样, f(集合, pk)也是可以得到唯一的元素. 如果得到的元素又是另一个集合的primary key呢? 于是: f2(集合2, f1(集合, pk))也是有唯一的元素的. 这样就可以把N个集合的关系表达出来.

primary key应该放在哪个集合中?

1. 对于"一对一"的情况, 放在哪个集合都是可以形成映射的

2. 对于"一对多"的情况. primary key应该存在"多"的一端.

数据库技术中, 使用primary key 与 forgein key概念来完成了表与表的相引用, 也增加了一些约束, 使key(为某一数字)拥有更多功能. 例如django文档这样写的:
When Django deletes an object, it emulates the behavior of the SQL
constraint ON DELETE CASCADE -- in other words, any objects which
had foreign keys pointing at the object to be deleted will be deleted
along with it.

虽然foreign key也是一个数字, 但给以它更多的意义.

BTW: 找数学知识文档没有经难. 比较难找.
http://baike.baidu.com/view/21249.htm

Thursday, October 22, 2009

screen的同类产品

一个名为tmux的软件, 功能与screen类似, 号称是BSD协议的GNU screen

这里有介绍: http://niallohiggins.com/2009/06/04/tmux-a-bsd-alternative-to-gnu-screen/

随便找个地方记录一下自己的screen配置, 方便日后快速找回:

escape ^Ss
bind j focus down
bind k focus up
bind t focus top
bind b focus bottom
# 删除这几个
bind ^k
bind 'K'

django的session处理

I. cookie与session的关系

HTTP是无状态协议, 它区别请求是借且于HTTP中的cookie字段来实现.
服务器端发出带cookie字段的HTTP response大致如下:
Set-Cookie: name=newvalue; expires=date; path=/; domain=.example.org
客户端发出带cookie字段的HTTP request大致如下:
Cookie: name=newvalue; expires=date; path=/; domain=.example.org

HTTP协议中对cookie有约定的属性, 见: http://en.wikipedia.org/wiki/HTTP_cookie#Cookie_attributes

session是在cookie的机制的基础上, 增加一个可以被WEB服务使用的token. 此token对应于服务器端的一个些具体的, 不希望在HTTP中传输的数据.

这就是{key:vaule}的关系

II. django里的cookie与session

在django的request对象中, 有两个与会话相关的对象:
request.COOKIES: 一个dict实例
request.session. 一个SessionStore类的实例. 在SessionMiddleware中被创建.

django在默认时, cookie被加入了一个叫sessionid的属性. 如下:
(Pdb) print request.COOKIES
{'sessionid': '40c340a947b41e4def92ed70c25affbe'}
* 可以通过settings.py修改
这个属性就是django里与session对应的token. 被称为session_key, 如:
(Pdb) print request.session.session_key
40c340a947b41e4def92ed70c25affbe

III. 服务器端返回set-cookie段

要把session对应的session_key放在HTTP的cookie段中, 是用到response.set_cookie方法. 一般情况下, 我们不需要直接调用它, 因为:
在SessionMiddleware的process_response方法中调用了set_cookie方法.
需要被中间件调用se_cookie方法需要一个条件:
modified = request.session.modified
if modified or settings.SESSION_SAVE_EVERY_REQUEST:

要想request.session.modified == True, 有两种方法:
1. 生成一个新的session对象
2. 修改或者增加session的key对应的值
也就是说, request.session.modified为True. 这涉及到SessionStore类. 下面理理这个类.

SessionStore类

在看django代码时, 会扯到python的一种语法: descriptor. 具体见:
1. http://users.rcn.com/python/download/Descriptor.htm
2. http://www.ibm.com/developerworks/cn/linux/l-python-elegance-2.html

SessionStore类主要起到session管理的作用. 如生成新的session, 修改session中key的value. 检查session_key是否有对应的session对象等等

1. 产生Session对象

SessionStore产生新的session对象(就是新的session_key)有两种方法:
1. load()
2. create()
load是从数据库中对指定session_key对应的session对象, 如果没有就调用create, 所以看create方法:
    def create(self):
        while True:
            self.session_key = self._get_new_session_key() #其实就是产生一个唯一的key
            try:
                # Save immediately to ensure we have a unique entry in the
                # database.
                self.save(must_create=True) #应用了数据库的models对象
            except CreateError:
                # Key wasn't unique. Try again.
                continue
            self.modified = True
            self._session_cache = {}
            return

2. 修改Session实例

在SessionStore类中, 定义了N多对Session实例的操作, 但是这些操作都是引用self._session这个descriptor. 这个descriptor到底返回了什么呢?
   def _get_session(self, no_load=False):
        """
        Lazily loads session from storage (unless "no_load" is True, when only
        an empty dict is stored) and stores it in the current instance.
        """
        self.accessed = True
        try:
            return self._session_cache
        except AttributeError:
            if self._session_key is None or no_load:
                self._session_cache = {}
            else:
                self._session_cache = self.load()
        return self._session_cache

    _session = property(_get_session)

* 引用self._session 其实是引用了self.load(), 这就是descriptor用法!
* 上面的self._session_cache就是session实例的session_data对应的那个dict实例

SessionStore对session实例的一个修改方法:
    def setdefault(self, key, value):
        if key in self._session:
            return self._session[key]
        else:
            self.modified = True
            self._session[key] = value
            return value
* 其中的 self._session[key]其实等于 self.load()[key] = value, 结合self.load的代码后同于 self._session_cache[key] = value

session对应的字典的保存:
在SessionBase中定义了一方法:
    def encode(self, session_dict):
        "Returns the given session dictionary pickled and encoded as a string."
        pickled = pickle.dumps(session_dict, pickle.HIGHEST_PROTOCOL)
        pickled_md5 = md5_constructor(pickled + settings.SECRET_KEY).hexdigest()
        return base64.encodestring(pickled + pickled_md5)
可以看出, 字典是被pickled成一堆字符串了. 需要被使用时, 就使用SessionBase的decode方法

BTW:: 使用eclipse看代码真是他妈的爽!! 前段时间花了不少功夫在eclipse上面, 还是有回报的. 看来除了firefox外, 还有一个一定要是GUI的软件

Tuesday, December 29, 2009

原因一: Also functions read better and are consistent with other declarations.

原因二: short declaration

顺眼了~~~~

Monday, December 28, 2009

1. 使引入外部函数时更加友好

2. 根据情况使用不同的函数

Sunday, December 27, 2009

locale的重要性

事件的源由

file system encoding的作用

default encoding的作用

Thursday, December 24, 2009

Wednesday, December 23, 2009

先对硬盘结构来一次回顾

硬盘数据的寻址方式

保留地址

Tuesday, December 22, 2009

谁能mount

文件系统中存在的owner:group怎么处理

其它没有权限位的文件系统: fat32

重载文件系统本身的权限位

一般用户mount时， 在不使用exec参数时， x权限被去掉

如果root用户使用mount /dev_file /mountpoint方式加载文件系统时, 默认情况:

mount point的权限怎么处理

Wednesday, December 16, 2009

Tuesday, December 15, 2009

shell的调用方式

login的shell, interactive的shell

nologin并且是interactive的shell

non-interative, no login的shell

non-interative, login的shell

Sunday, December 13, 2009

Saturday, December 12, 2009

现象归纳

眼见为实

CTIME函数

Tuesday, December 8, 2009

Friday, December 4, 2009

Thursday, December 3, 2009

Tuesday, December 1, 2009

编码应该是要知道的

1. 'xxxxx'为外码

2. 'xxxxx'为内码

Sunday, November 29, 2009

Wednesday, November 18, 2009

Tuesday, November 17, 2009

Thursday, November 12, 2009

Saturday, November 7, 2009

Wednesday, October 28, 2009

Tuesday, October 27, 2009

ManyToOne Field添加引用实例

Following relationships "backward"

ManyToMany Field添加引用实例

通过through参数, 使用定制义的中间表, 此后, ManyToMany Field没有了add方法!

一对多, 多对多, 多对一, 一对一. 这是什么?

primary key 和 foreign key又是何方神圣?

Thursday, October 22, 2009

I. cookie与session的关系

II. django里的cookie与session

III. 服务器端返回set-cookie段

SessionStore类

1. 产生Session对象

2. 修改Session实例

jessinio's shared items

Followers

一般用户mount时，在不使用exec参数时， x权限被去掉