技术 – Page 5 – OnlyMarshall

Java的Date&Time居然也JSR要通过了

很少在这里发技术的牢骚。Java里的日期时间这块简直是耻辱，java.util.Date里那么多的Deprecated方法和属性，除了类本身，基本就是废柴。这里的消息表明，以后用Java日期时间的话，虽然可能不需要手动加入jar包（应该会在JDK里，毕竟太常用了），需要引入javax.time包了

一道编程珠玑的题，终于搞懂了

给定一个包含4300000000个32位整数的顺序文件，请问如何找到一个至少出现两次的整数？

顺序文件，不允许随机访问。

解答：Binary Search，但不是对文件内容折半，而是对搜索范围折半。由于4.3G>32位的整数空间，根据鸽笼原理，肯定会有重复的整数。搜索范围从所有的32位正整数开始（全部当成unsigned int，简化问题），即[0, 2^32)，中间值即为2^31。然后遍历文件，如果小于2^31的整数个数大于2^31，则调整搜索范围为[0, 2^31]，反之亦然；然后再对整个文件再遍历一遍，直到得到最后的结果。这样一共会有logn次的搜索，每次过n个整数（每次都是完全遍历），总体的复杂度为o(nlogn)。

例子：数组[4,2,5,1,3,6,3,7,0,7]，假定从3位的整数空间内搜索。第一次的范围为[0,8)，遍历过后发现[0,4)范围内的整数个数为5，于是调整为搜索[0,4)范围内的整数。第二次发现[2, 4)范围内的证书为3，大于2，于是调整为[2, 4)。再经过第三次的遍历，找出3为重复出现的整数。

改进：上面的办法有很多的冗余。于是提出了一个办法：建立一个新的文件（是顺序文件就可以）。在一次遍历过后，确定搜索的范围后，把原有文件里这个范围内的整数写到新的文件里去，下次搜索就只要搜索这个新文件了。这样可以得到近似线性的复杂度（但是常数项应该很大）。

嗯，算法还要加强，不然以后面试要挂了，呵呵。

使用Cache-Control和gzip提升tomcat应用性能(整理)

这个其实应该是常识，只不过以前做的J2EE应用大部分是内网里跑的东西，所以性能上没什么问题。这次APIS由于有在外面用的可能，加上使用了一些比较大的javascript框架(Ext)，所以性能问题瞬间窜了上来。
以前做的J2EE应用没有使用上达500K的框架，最多就是几十K的Prototype，所以没什么问题。一个页面一般也就几十K最多了。但这次还在开发中的APIS，由于还在用debug版本的库，所以单单Ext就膨胀到了一个多M，加上不知道是Struts还是Tomcat默认写入Response的cache-control: no cache，在远程用起来就很慢，一般一个页面需要十多秒种甚至更久，实在无法忍受。前几天集中解决了问题。
首先是Cache-Control的问题，Google了好一阵，没有什么直接配置的方法，只好自己抄了一个一个Filter，通过和web.xml里配置的配合勉强凑合着用。一般就是对*.do实施no-cache政策，其他需要缓存的img, js文件，统统加上长达两周的缓存期限。ETag实在不会用，就先用这个缓存策略吧。
Filter的代码：

public class ResponseHeaderFilter implements Filter {
 FilterConfig fc; 
 public void doFilter(ServletRequest req, ServletResponse res,
 FilterChain chain) throws IOException, ServletException {
 HttpServletResponse response = (HttpServletResponse) res;
 // set the provided HTTP response parameters
 for (Enumeration e = fc.getInitParameterNames(); e.hasMoreElements();) {
 String headerName = (String) e.nextElement();
 response.addHeader(headerName, fc.getInitParameter(headerName));
 }
 // pass the request/response on
 chain.doFilter(req, response);
 } 
 public void init(FilterConfig filterConfig) {
 this.fc = filterConfig;
 } 
 public void destroy() {
 this.fc = null;
 } 
}

web.xml里的巧妙配置：

<filter>
 <filter-name>NoCache</filter-name>
 <filter-class>apis.server.common.util.ResponseHeaderFilter</filter-class>
 <init-param>
 <param-name>Cache-Control</param-name>
 <param-value>no-cache, must-revalidate</param-value>
 </init-param>
 </filter>
 <filter>
 <filter-name>CacheForWeek</filter-name>
 <filter-class>apis.server.common.util.ResponseHeaderFilter</filter-class>
 <init-param>
 <param-name>Cache-Control</param-name>
 <param-value>max-age=604800, public</param-value>
 </init-param>
 </filter>
<filter-mapping>
 <filter-name>NoCache</filter-name>
 <url-pattern>*.do</url-pattern>
 </filter-mapping>
 <filter-mapping>
 <filter-name>CacheForWeek</filter-name>
 <url-pattern>/images/*</url-pattern>
 </filter-mapping>
 <filter-mapping>
 <filter-name>CacheForWeek</filter-name>
 <url-pattern>/img/*</url-pattern>
 </filter-mapping>
 <filter-mapping>
 <filter-name>CacheForWeek</filter-name>
 <url-pattern>/icons/*</url-pattern>
 </filter-mapping>
 <filter-mapping>
 <filter-name>CacheForWeek</filter-name>
 <url-pattern>/ext/*</url-pattern>
 </filter-mapping>
 <filter-mapping>
 <filter-name>CacheForWeek</filter-name>
 <url-pattern>*.js</url-pattern>
 </filter-mapping>
 <filter-mapping>
 <filter-name>CacheForWeek</filter-name>
 <url-pattern>*.css</url-pattern>
 </filter-mapping> 

(插入一段：在探测这些性能问题的时候，我使用的是一个Firebug的插件，也就是Firefox插件的插件-YSlow，好像是Yahoo的，结合Firebug里XHR的Net这块做Profiling，效果很不错，很容易就知道瓶颈)
还有一个gzip的办法，就是在服务器压缩内容，再传给浏览器。现在主流的浏览器都支持gzip压缩，而且这些html和js文本压缩起来很厉害，基本上可以有40%的压缩率。办法在servel.xml的注释里也有写，就是在Connector元素里加上
compression=”on”
compressionMinSize=”2048″
noCompressionUserAgents=”gozilla,traviata”
compressableMimeType=”text/html,text/xml,text/javascript,text/css,text/plain”
以上的内容大部分都是Google得来，我自己做了一下整理

解决Ubuntu升级8.04 “未能计算更新”问题

其实问题也不难，只要好好看提示就可以了。提示里说查看/var/log/dist-upgrade/里的内容，之前粗粗地看过一遍也没在意，main.log里没问题，term.log是空的，apt.log里东西挺多没仔细看。一直以为是cn99源的问题。换了cn.archive.ubuntu.com也失败。
今天把ubuntu-alternative给下了下来，发现还是这个问题。于是好好看了apt.log，发现有这么一句话：

language-support-cn has broken dep on openoffice.org-l10n-zh-cn language-support-cn has broken dep on openoffice.org-l10n-zh-tw

难道这几个东西之间有冲突？卸掉openoffice l10n的包以后，问题解决。
结论：仔细看日志。

Google TreasureHunt

Google悉尼一个寻找工程师的做题站点。现在还只有3题，一周更新一题。题目难度不大，但是还挺锻炼编程或者思考能力的（虽然robot那题比较火星，最新一题似乎用硬算的）
http://treasurehunt.appspot.com

Upgrading to Spring Security 2.0(zz)

原文：http://raibledesigns.com/rd/entry/upgrading_to_spring_security_2
就是appfuse作者的博客，做了一些精简
1. 包变化：org.acegisecurity => org.springframework.security
2. 依赖变化（略，不用Maven）
3. tag标签的开头authz => security, 然后把taglib的关联项改为

<%@ taglib uri="http://www.springframework.org/security/tags"
    prefix="security" %>

4. web.xml，把<filter-class>改为org.springframework.web.filter.DelegatingFilterProxy，另外还要加上<init-param>标签

    <init-param>
        <param-name>targetBeanName</param-name>
        <param-value>springSecurityFilterChain</param-value>
    </init-param>

5. 修改security.xml,使用新的语法.根据作者的说法，AppFuse的security.xml的长度从177行下降到了33行,因为使用了很多convention over configuration的元素，如<http auto-config=”true”/>。关于语法，还需要一些实践把握。

java diff 及wiki相关

diff的原理在于找两个字符串之间的最大相同子串(Longest Common Subsequence)以及编辑距离，比较有名的实现是UnixLinux上常用的diff(GNU Diff)。

实现

Java里Diff的实现我找了一下，主要是两个，java-diff 和bsmi上的Diff ，前者为LGPL，后一个为GPL。其实代码也都不多，都实现了LCS算法。前一个协议上对我们比较有利，而且文档、测试和例子多一些。
JavaDiff里主要有两个类，Diff和Difference类。前者是算法，后者是差异的表示类。下面讲一下例子：


Object[] a = new Object[] {
        "a",
        "b",
        "c",
        "d",
        "e"
    };

    Object[] b = new Object[] {
        "a",
        "x",
        "y",
        "b",
        "c",
        "j",
        "e",
    };

    Difference[] expected = new Difference[] {
        new Difference(1, -1,  1,  2),
        new Difference(3,  3,  5,  5),
    };

    Diff diff = new Diff(a, b);
    List diffOut = diff.diff();

差别有三处，用两个Difference对象表示。一个Difference对象表示替换，增加，删除。Difference的构造函数：

public Difference(int delStart, int delEnd, int addStart, int addEnd)

如果delEnd或者addEnd为-1的话，就代表没有删除或者增加行为。
回到例子，两个字符串之间的差别在于，目标字符串在第1-2行(从0算起)增加了x,y，第3行的d被第5行的j替换。Difference虽然只说明了行号和动作，但我们可以推算出来增加了什么，删除了什么，替换了什么。下面是另一个更长的例子，来自测试用例：




public void testStrings1()
    {
        Object[] a = new Object[] {
            "a",
            "b",
            "c",
            "e",
            "h",
            "j",
            "l",
            "m",
            "n",
            "p"
        };

        Object[] b = new Object[] {
            "b",
            "c",
            "d",
            "e",
            "f",
            "j",
            "k",
            "l",
            "m",
            "r",
            "s",
            "t"
        };

        Difference[] expected = new Difference[] {
            new Difference(0,  0,  0, -1),
            new Difference(3, -1,  2,  2),
            new Difference(4,  4,  4,  4),
            new Difference(6, -1,  6,  6),
            new Difference(8,  9,  9, 11),
        };

        runDiff(a, b, expected);
    }

上面比较的都是一个个字符串的差异，推广一下，把每一行文本当作一个字母，就可以得到文件的差异。在java-diff的etc下有一个FileDiff.java，是一个很好的参考。得到之间的差异之后，我们要把这个差异表示出来，这个需要包装一下，不过难度不大。

版本保存

还有一个wiki版本的保存问题。大的维基引擎如MediaWiki(就是维基百科那个，顺便说一下，维基百科的英文版终于可以访问了)没时间研究，就是 JSPWiki也没来得及看)(JSPWiki连数据库也不用，Web用自己写的框架，可读性可能比较不行)。只研究了trac的wiki实现。trac 的wiki实现很简单，就是把每一个版本都保存在数据库，毕竟都是文本的，还可以接受。每次比较的时候就从数据库里取两个版本出来做一个diff，具体实现在PYTHON/site-packages/trac/wiki/web_ui.py(_render_diff函数)。trac提供两种形式的 diff结果，一个是tabular的表格形式，就是很直观的对比，还有一个是Unified的形式，也就是经常看见的diff结果。这是通过页面上 javascript读table里的文字转换成Unified格式的diff文本，虽然个人不推荐这种方式。wiki的文本修改又有一个特点，就是每一行其实内容可能比较多，只改了几个字，这样就要对这一行的两个版本再做一个diff，然后把删除的文本用<del>标签，增加的文本用 <ins>标签展示出来。
最后提一下JSR-170，一个用来管理仓库内容（主要是大型CMS）的API，支持版本控制，存储多元化，很复杂，有两个商业实现和一个Apache JackRabbit的开源实现，这里是一个参考资料。JSR170也是里面的例子也是把每一个版本都存储下来。

参考资料

看来最好的代码阅读器还是IDE

这两天要研究下Acegi，给人做技术讲座，内容要求和Acegi有关。于是想找个代码阅读器来看代码。
第一个想到的就是SourceInsight，但考虑到SourceInsight还是收费软件，于是转向开源产品。搜了半天，找出一个Code Browser，没想到功能比Notepad++还差一些。于是无奈就用Eclispe看看代码。因为Acegi和Spring结合得很紧密，顺便把Spring IDE的帮助看了一下，第一次用了起来，觉得很好用啊。这么多年开发Spring应用居然都没好好用Spring IDE，真是惭愧。
Attach上Source的Eclispe+Spring IDE看起Acegi代码果然非常方便，按住Ctrl进行智能导航，还有引用查找等。怪不得一直找不到合适的开源代码查看器，原来IDE就已经这么好用了。

web分页的设计原则

1. 提供大面积的点击区域
2. 不要使用下划线
3. 明显地标识出当前页码
4. 格开各页的链接
5. 提供前一页和后一页
6. 使用第一页和最后一页的链接，当有必要时
7. 把第一页和最后一页的链接放在最外面，如下
« First ‹ Previous Current Next › Last »

Trac安装手记

在服务器安装的手记。操作系统RHEL4。基本都是用RPM安装的。
原来就装好了SVN和httpd，trac的RPM除此外还依赖了clearsilver, sqlite, python-clearsilver, python-sqlite。依赖都装好了以后，rpm还不认为http已经安装，这时候用–no-deps强行把trac装上去就OK了。
过程中主要参考了Trac平台安装这个文档中的Rehat Linux这个。RHEL4对我反而没有什么太大的帮助，主要原因是服务器上没有yum:(不过幸运的是，我们可以直接跳过前面那个configure,make,makeinstall的步骤，直接开始配置。
配置的第一步是svn repository的建立，后面那个/var/svn的路径自已改，比如我就放在了/var/svn/ac990jcy，因为我喜欢一个项目一个repository

$ svnadmin create --fs-type=fsfs /var/svn

接下去是trac环境的建立

$ trac-admin /var/trac initenv/usr/local/lib/python2.3/site-packages/libsvn/core.py:5: RuntimeWarning: Python C API version mismatch for module _core: This Python has API version 1012, module _core has version 1011.  import _core/usr/local/lib/python2.3/site-packages/libsvn/fs.py:5: RuntimeWarning: Python C API version mismatch for module _fs: This Python has API version 1012, module _fs has version 1011.  import _fs/usr/local/lib/python2.3/site-packages/libsvn/delta.py:5: RuntimeWarning: Python C API version mismatch for module _delta: This Python has API version 1012, module _delta has version 1011.  import _delta/usr/local/lib/python2.3/site-packages/libsvn/repos.py:5: RuntimeWarning: Python C API version mismatch for module _repos: This Python has API version 1012, module _repos has version 1011.  import _reposCreating a new Trac environment at /var/trac
Trac will first ask a few questions about your environmentin order to initalize and prepare the project database.
 Please enter the name of your project. This name will be used in page titles and descriptions.
Project Name [My Project]> ac990jcy(项目的名称)
 Please specify the absolute path to the project Subversion repository. Repository must be local, and trac-admin requires read+write permission to initialize the Trac database.
Path to repository [/var/svn/test]> /var/svn(我用的是/var/svn/ac990jcy)
 Please enter location of Trac page templates. Default is the location of the site-wide templates installed with Trac.
Templates directory [/usr/local/share/trac/templates]> (Press enter here)(直接按enter)Creating and Initializing Project(Output removed)Project database for 'My Project' created.
 Customize settings for your project using the command:
   trac-admin /var/trac
 Don't forget, you also need to copy (or symlink) "trac/cgi-bin/trac.cgi" to you web server's /cgi-bin/ directory, and then configure the server.
 If you're using Apache, this config example snippet might be helpful:
    Alias /trac "/wherever/you/installed/trac/htdocs/"            SetEnv TRAC_ENV "/var/trac"
    # You need something like this to authenticate users            AuthType Basic        AuthName "My Project"        AuthUserFile /somewhere/trac.htpasswd        Require valid-user
 The latest documentation can also always be found on the project website: http://projects.edgewall.com/trac/
Congratulations!

然后像最后一段那样配置apache

Alias /trac "/usr/local/share/trac/htdocs/"   #要设置trac的环境，不然怎么读？   #或者用Set Env TRAC_ENV_PARENT_DIR "/var/trac"   #我就是parent，这样就可以管理多个项目了   SetEnv TRAC_ENV "/var/trac/ac990jcy"
# You need something like this to authenticate users   AuthType Basic

#使用HTTP Basic方法验证   AuthName "ac990jcy"#登录名   AuthUserFile /var/trac/conf/htpasswd#这个是登录用的密码文件   Require valid-user#需要登录？

生成密码文件，这是创建时候的命令，以后要添加的话就不要用’-c’选项了，不然以前的都没了

$ cd /var/trac/conf$ /usr/local/apache2/bin/htpasswd -c htpasswd admin

更改trac目录访问权限，不然httpd的权限很低的，没法读trac的目录

$ chmod -Rv a+rw /var/trac

拷贝trac.cgi

$ cd /usr/local/apache2/cgi-bin$ cp /usr/local/share/trac/cgi-bin/trac.cgi .

更改httpd运行用户 在httpd.conf里找到User daemon，Group daemon这一行，改成

User svnrootGroup svnroot

最后，用/usr/local/apache2/bin/apachectl -k restart/start 启动httpd，完成