20210804

20210804

pandasを主に使用。以下のリンクを参考にした。

pandasのデータフレームに対してループ処理:https://note.nkmk.me/python-pandas-dataframe-for-iteration/

pandasのデータフレームに対してIndexを振り直す:https://note.nkmk.me/python-pandas-reset-index/

pandasのデータフレームをソート:https://note.nkmk.me/python-pandas-sort-values-sort-index/

/home/ito.takumi/work/mitosearch/testdata/compare にて、script/dberr.py を実行。comparedata/IDcompare.diff.txt を出力。

script/db_err.py

import sys
import pandas as pd

def main():
    srrID = sys.argv[1]
    inputFilePath = "compare_data/" + srrID + "_compare_matrix.filtered.tsv"
    outputFilePath = "compare_data/" + srrID + "_compare.diff.txt"
    out_f = open(outputFilePath,"w")
    try:
        df = pd.read_table(inputFilePath)
        df_copy = df.copy()
        df_copy = df_copy.drop(df.columns[0],axis=1)

        for columnName, item in df_copy.iteritems():
            try:
                column_sorted = item.sort_values(ascending=False)
                column_sorted_resetIndex = column_sorted.reset_index(drop=True)
                top_hit = column_sorted_resetIndex[0]
                second_hit = column_sorted_resetIndex[1]
                if second_hit == 0:
                    pass
                else:
                    row = srrID + "\t" + columnName + "\t" + str(top_hit) + "\t" + str(second_hit)
                    out_f.write(row + "\n")
            except:
                pass
    except:
        pass
    out_f.close()
if __name__ == "__main__":
    main()
  • 20210804.1628055836.txt.gz
  • 最終更新: 2021/08/04 05:43
  • by 133.11.144.10