R script to perl?

Wednesday, July 9, 2014

I have an R script I want to use to parse a file and get some info out of it, but the file is 44 GB.


Can someone help me write this in a programming language that is faster in reading files?


The script is pretty simple:



ld <- read.table("plink-inter-chr---ld-window-r2-0.ld", header = T)
ldv1 <- do.call(rbind, strsplit(as.character(ld[,1]), "_"))
ldv4 <- do.call(rbind, strsplit(as.character(ld[,4]), "_"))
ld <- matrix(c(ldv1[,2], ldv4[,2], ld[,2], ld[,5], ld[,7]), ncol=5)
N <- 30
within <- numeric(N)
between <- numeric(N)
for(i in 1:N){
within[i] <- mean(as.numeric(ld[which(ld[,1] == i & ld[,2] == i),5]))
between[i] <- mean(as.numeric(ld[which(ld[,1] == i & ld[,2] != i),5]))
}
table <- matrix(c(within, between), ncol=2)
write.table(table, file = "within-between.tab", quote = FALSE, row.names = FALSE, col.names = FALSE)


And the file looks as such:



CHR_A BP_A SNP_A CHR_B BP_B SNP_B R2 DP
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 2210 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 2419 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 2524 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 2587 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 2799 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 2947 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 3142 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 3178 . 1 1
NODE_1_length_193190_coverage_19.3759_GC_24.97 919 . NODE_1_length_193190_coverage_19.3759_GC_24.97 3261 . 1 1


Thank you for your help, Adrian







http://ift.tt/1jq4jSk