TaskRun failed to finish due to an error for Coretex BioInformatics workflow

Question

After starting bioinformatics workflow in Coretex, I am getting the following message even though data seems to be in order: "Failed to determine which column contains sampleIDs/names..." and then the list of available names, but I am using one from the list.

I am trying to run a microbiome sequencing task in Coretex, and I have used standard microbiome sequencing data in .fastq.gz format. Run should have been successful but it is failing every time.

I've worked with this R code for uploading metadata:

loadMetadata <- function(metadataSample) {
metadata_csv_path <- builtins$str(
    metadataSample$joinPath("metadata.csv")
)

if (file.exists(metadata_csv_path)) {
    # Default SampleSheet.csv format
    metadata <- read.table(
        metadata_csv_path,
        sep = ",",
        header = TRUE,
        check.names = TRUE
    )
} else {
    # Format accepted by qiime2
    metadata_tsv_path <- builtins$str(
        metadataSample$joinPath("metadata.tsv")
    )

    if (!file.exists(metadata_tsv_path)) {
        stop("Metadata file not found")
    }

    metadata <- read.table(
        metadata_tsv_path,
        sep = "\t",
        header = TRUE,
        check.names = TRUE
    )

    # qiime has 1 extra row after header which contains types
    metadata <- metadata[-1,]
}

# Remove leading and trailing whitespace
colnames(metadata) <- lapply(colnames(metadata), trimws)

stringColumns <- names(metadata)[vapply(metadata, is.character, logical(1))]
metadata[, stringColumns] <- lapply(metadata[, stringColumns], trimws)

sampleIdColumn <- getSampleIdColumnName(metadata)
print(paste("Matched metadata sample ID/name column to", sampleIdColumn))

print("Renaming metadata sample ID/name column to \"sampleId\"")
names(metadata)[names(metadata) == sampleIdColumn] <- "sampleId"

print("Metadata")
print(colnames(metadata))
print(head(metadata))

print(metadata$sampleId)

# assign the names of samples (01Sat1...) to metadata rows instead of 1,2,3...
row.names(metadata) <- metadata$sampleId
metadata$sampleId <- as.factor(metadata$sampleId)

return(metadata)

}

Duško Mirković · Accepted Answer · 2024-08-20 07:04:29Z

1

Judging by the logs of your Coretex Workflow it looks like your Dataset contains metadata.csv file which uses ; as a separator, but the Coretex Task for loading BioInformatics data tries to load it with a , as a separator. This was changed in the latest version of the Task and you can see the full changelog here.

Instead of always forcing the separator to be , (old version):

    # Default SampleSheet.csv format
    metadata <- read.table(
        metadata_csv_path,
        sep = ",",
        header = TRUE,
        check.names = TRUE
    )

It will now try to automatically determine what the separator is using fread function (new version):

    metadata <- fread(metadata_csv_path, data.table=FALSE)

edited Aug 20, 2024 at 7:04

answered Aug 19, 2024 at 14:42

Duško Mirković

2522 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

axcac Over a year ago

Yes, this is it, it is working now.

Collectives™ on Stack Overflow

TaskRun failed to finish due to an error for Coretex BioInformatics workflow

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related