At work, sometimes we need to combine and analysis information from lots of output files together.
For example, this could be summary output from many different conditions/ simulation parameters using either public software or our own scripts.
Each output file is a table of same format (e.g. 1st column is a gene id, 2nd column is score using tool 1, 3rd column is score using tool 2) under slightly different experimental conditions.
We may want to combine them into a big dataframe, in this way, it’s easier to use visualization tools to help us identify patterns of the data.
We could do something like
a <- read.table("file1.txt",head=T)
b <- read.table("file2.txt",head=T)
colnames(a) <- paste("file1",colnames(a)) #rename so no confilct afterwards
tmp <- merge(a,b,by="row.names")
c <- read.table("file3.txt",head=T)
tmp2 <- merge(tmp,c, by="row.names")
This requires lots of typing when you have to handle many files!
Then how to make it easier? We just want to do the same things to all files.
Here are some simple scripts to do this task, adapted from https://stat.ethz.ch/pipermail/r-help/2008-April/159060.html and https://stackoverflow.com/questions/16666643/merging-more-than-2-dataframes-in-r-by-rownames!
#get the file names of df need to merge
mypath<- getwd()
filenames=list.files(path=mypath, full.names=TRUE)
#read in all data
allData <- lapply(filenames, function(x){
dat<-read.table(file=x, header=T)
# this name is long with path info, we just want to use the last part of it
a<- unlist(strsplit( x,"/"))
names(dat)<-paste( a[length(a)], colnames(dat),sep="_") # rename columns
dat # return the dataframe
})
#merge all of them
MyMerge <- function(x, y){
df <- merge(x, y, by= "row.names", all.x= F, all.y= F) #merge them
rownames(df) <- df$Row.names
df$Row.names <- NULL
return(df)
}
# save output
out<- Reduce(MyMerge,allData) # do this to all elements of allData
Then we got the data frame we want!