2

After starting bioinformatics workflow in Coretex, I am getting the following message even though data seems to be in order: "Failed to determine which column contains sampleIDs/names..." and then the list of available names, but I am using one from the list.

I am trying to run a microbiome sequencing task in Coretex, and I have used standard microbiome sequencing data in .fastq.gz format. Run should have been successful but it is failing every time.

I've worked with this R code for uploading metadata:

loadMetadata <- function(metadataSample) {
metadata_csv_path <- builtins$str(
    metadataSample$joinPath("metadata.csv")
)

if (file.exists(metadata_csv_path)) {
    # Default SampleSheet.csv format
    metadata <- read.table(
        metadata_csv_path,
        sep = ",",
        header = TRUE,
        check.names = TRUE
    )
} else {
    # Format accepted by qiime2
    metadata_tsv_path <- builtins$str(
        metadataSample$joinPath("metadata.tsv")
    )

    if (!file.exists(metadata_tsv_path)) {
        stop("Metadata file not found")
    }

    metadata <- read.table(
        metadata_tsv_path,
        sep = "\t",
        header = TRUE,
        check.names = TRUE
    )

    # qiime has 1 extra row after header which contains types
    metadata <- metadata[-1,]
}

# Remove leading and trailing whitespace
colnames(metadata) <- lapply(colnames(metadata), trimws)

stringColumns <- names(metadata)[vapply(metadata, is.character, logical(1))]
metadata[, stringColumns] <- lapply(metadata[, stringColumns], trimws)

sampleIdColumn <- getSampleIdColumnName(metadata)
print(paste("Matched metadata sample ID/name column to", sampleIdColumn))

print("Renaming metadata sample ID/name column to \"sampleId\"")
names(metadata)[names(metadata) == sampleIdColumn] <- "sampleId"

print("Metadata")
print(colnames(metadata))
print(head(metadata))

print(metadata$sampleId)

# assign the names of samples (01Sat1...) to metadata rows instead of 1,2,3...
row.names(metadata) <- metadata$sampleId
metadata$sampleId <- as.factor(metadata$sampleId)

return(metadata)

}

1 Answer 1

1

Judging by the logs of your Coretex Workflow it looks like your Dataset contains metadata.csv file which uses ; as a separator, but the Coretex Task for loading BioInformatics data tries to load it with a , as a separator. This was changed in the latest version of the Task and you can see the full changelog here.

Instead of always forcing the separator to be , (old version):

    # Default SampleSheet.csv format
    metadata <- read.table(
        metadata_csv_path,
        sep = ",",
        header = TRUE,
        check.names = TRUE
    )

It will now try to automatically determine what the separator is using fread function (new version):

    metadata <- fread(metadata_csv_path, data.table=FALSE)
Sign up to request clarification or add additional context in comments.

1 Comment

Yes, this is it, it is working now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.