日期:2014-05-17  浏览次数:20719 次

大家过来看看我的问题,非常有意思
为什么将文件按行存入到数组然后遍历这个数组要远远比直接按行遍历文件的速度要慢。
下边是我的代码:
从文件读取
Java code
BufferedReader brSnpDetailFile = new BufferedReader(
                new InputStreamReader(new FileInputStream(
                        inputParameters.getAnnotationFilePath())));
        int index = 0;
        while ((stringLine = brSnpDetailFile.readLine()) != null) {
            String[] tempString = stringLine.split("\t", -1);
            /*server*/
            String snpId = tempString[3];
            String associatedGeneName = tempString[4];
            
            /*test
            String snpId = tempString[3];
            String associatedGeneName = tempString[5];
            */
            if (userSnpFileHashMap.containsKey(snpId)) {

                float pValue = userSnpFileHashMap.get(snpId);
                Snp snp = new Snp();
                snp.setAssociatedGeneName(associatedGeneName);
                snp.setReferenceId(snpId);
                snp.setPValue(pValue);

现在我先将文件存入数组
Java code
public void annotationFileParser(String filePath,String[] annotationFileArray){
        try {
            BufferedReader brAnnotation = new BufferedReader(
                    new InputStreamReader(new FileInputStream(new File(filePath))));

            String stringLine = null;
            int index = 0;
            while ((stringLine = brAnnotation.readLine()) != null) {
                annotationFileArray[index] = stringLine;
                index ++;
                if(index % 10000000 == 0){
                    System.out.println(index);
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }[/

然后遍历这个数组:
Java code
int index = 0;
        for(int i = 0;i < annotationFileArray.length;i ++){
            
            /*server*/
            String snpId = annotationFileArray[i].split("\t", -1)[3];
            String associatedGeneName = annotationFileArray[i].split("\t", -1)[4];        
            /*
            String snpId = annotationFileArray[i].split("\t", -1)[3];
            String associatedGeneName = annotationFileArray[i].split("\t", -1)[5];
            */
            
            if (userSnpFileHashMap.containsKey(snpId)) {
                float pValue = userSnpFileHashMap.get(snpId);
                Snp snp = new Snp();
                snp.setAssociatedGeneName(associatedGeneName);
                snp.setReferenceId(snpId);
                snp.setPValue(pValue);
                
                // put all filtered snp into array.
                snpArrayList.add(snp);
                // put all filtered P-Value into array.
                pValueArrayList.add(pValue);
        
                if (!associateGeneNameIndexArrayHashMap
                        .containsKey(associatedGeneName)) {
                    ArrayList<Integer> tempArrayList = new ArrayList<Integer>();
                    tempArrayList.add(index);
                    associateGeneNameIndexArrayHashMap.put(associatedGeneName, tempArrayList);
                } else {
                    associateGeneNameIndexArrayHashMap.get(associatedGeneName).add(index);
                }
                index++;
                

现在速度变慢很多。。。
请各位大牛指教!
补充一下这个文件大小有1.4G,我的服务器内存是24G

------解决方案--------------------
那是当然,你将其存入数组再遍历,等于说是有遍历了两次,一次是对文件按行遍历,一次是对数组遍历。
并且,你将文件按行存入数组,又增加了将近一倍的内在消耗。