Single-cell 10x Cell Ranger analysis

first step download SRR data

#This is a batch download
nohup prefetch -X 100GB --option-file SRR_Acc_List.txt &
nohup fastq-dump --gzip --split-files -A ./SRR13633760 -O /home/scRNA/ & amp;

next Build a custom reference using Cell Ranger mkref

First, find the reference genome FASTA and GTF files for your species. If the species is available from the Ensembl database, we recommend using the files there. GTF files from Ensembl contain optional tags to make filtering easy. If the species you are interested in is not available with Ensembl, GTF and FASTA files from other sources can also be used. Note that GTF files are required and GFF files are not supported. (See GFF/GTF file formats – definitions and supported options)

This is a link to Ensembl for selecting species to create reference genome files

Here is a demonstration using rats

Open the contents in the red box in the picture respectively.

Click on the content in the red box

Select the top FASTA file for download

The download of the first file has been completed here. The download of the second file is shown below. You need to open the box below and select Download GTF

Select this download

Now we have completed downloading the content

The next step is to construct the reference genome

I understand that this step is not necessary. When doing some analysis, you can exclude genes based on this, for example, only add –attribute=gene_biotype:protein_coding to only analyze the encoded proteins in all cells

Look at the description

(base) hwsw@shpc-2596-instance-GkVAxmvG:~$ $cellranger mkgtf
Usage:
    mkgtf <input_gtf> <output_gtf> [--attribute=KEY:VALUE...]
    mkgtf -h | --help | --version
(base) hwsw@shpc-2596-instance-GkVAxmvG:~$ $cellranger mkgtf -h
Genes GTF tool for 10x Genomics Cell Ranger.

Filter user-supplied GTF files for use as Cell Ranger-compatible
genes files for mkref tool.

The commands below should be preceded by 'cellranger':

Usage:
    mkgtf <input_gtf> <output_gtf> [--attribute=KEY:VALUE...]
    mkgtf -h | --help | --version

Arguments:
    input_gtf Path to input genes GTF file.
    output_gtf Path to filtered output genes GTF file.

Options:
    --attribute=<key:value>
                        Key-value pair in attributes field to be kept in the GTF
                            file.
    -h --help Show this message.
    --version Show version.

This is to select the desired phenotype and then filter it

#Filter GTF
cellranger=/home/hwsw/cellranger-7.1.0/cellranger
$cellranger mkgtf \
Rattus_norvegicus.mRatBN7.2.105.gtf Rattus_norvegicus.mRatBN7.2.105.filtered.gtf \
--attribute=gene_biotype:protein_coding \
--attribute=gene_biotype:lncRNA \
--attribute=gene_biotype:antisense \
--attribute=gene_biotype:IG_LV_gene \
--attribute=gene_biotype:IG_V_gene \
--attribute=gene_biotype:IG_V_pseudogene \
--attribute=gene_biotype:IG_D_gene \
--attribute=gene_biotype:IG_J_gene \
--attribute=gene_biotype:IG_J_pseudogene \
--attribute=gene_biotype:IG_C_gene \
--attribute=gene_biotype:IG_C_pseudogene \
--attribute=gene_biotype:TR_V_gene \
--attribute=gene_biotype:TR_V_pseudogene \
--attribute=gene_biotype:TR_D_gene \
--attribute=gene_biotype:TR_J_gene \
--attribute=gene_biotype:TR_J_pseudogene \
--attribute=gene_biotype:TR_C_gene

Prepare reference genome for single cell analysis

cellranger mkref –help View command parameters

Reference preparation tool for 10x Genomics Cell Ranger.
 
Build a Cell Ranger-compatible reference folder from user-supplied genome FASTA and gene GTF files. Creates a new folder named after the genome.
 
The commands below should be preceded by 'cellranger':
 
Usage:
    mkref
        --genome=NAME ...
        --fasta=PATH ...
        --genes=PATH ...
        [options]
    mkref -h | --help | --version
 
Arguments:
    genome #Output folder Unique genome name(s), used to name output folder
                            [a-zA-Z0-9_-] + . Specify multiple genomes by
                            specifying the --genome argument multiple times; the
                            output folder will be <name1>_and_<name2>.
    fasta #FASTA reference genome absolute path
                            Path(s) to FASTA file containing your genome reference.
                            Specify multiple genomes by specifying the --fasta
                            argument multiple times.
    genes #.filtered.gtf annotation file absolute path
                            Path(s) to genes GTF file(S) containing annotated genes
                            for your genome reference. Specify multiple genomes
                            by specifying the --genes argument multiple times.
 
Options:
    --nthreads=<num> This option is currently ignored due to a bug, and will be re-enabled
                          in the next Cell Ranger release.
    --memgb=<num> Maximum memory (GB) used when aligning reads with STAR.
                            Defaults to 16.
    --ref-version=<str> Optional reference version string to include with
                            reference.
    -h --help Show this message.
    --version Show version.

The analysis takes a long time and takes 2-3 hours

#Run mkref
cellranger mkref \
--genome=mRatBN7 \
--fasta=Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa \
--genes=Rattus_norvegicus.mRatBN7.2.110.filtered.gtf \
--ref-version=1.0.0

I took screenshots of other people’s pictures and running them

Apr 15 14:36:45 ..... started STAR run
Apr 15 14:36:45 ... starting to generate Genome files
Apr 15 14:38:52 ... starting to sort Suffix Array. This may take a long time...
Apr 15 14:39:03 ... sorting Suffix Array chunks and saving them to disk...
Apr 15 16:40:45 ... loading chunks from disk, packing SA...
Apr 15 16:41:47 ... finished generating suffix array
Apr 15 16:41:47 ... generating Suffix Array index
Apr 15 16:46:07 ... completed Suffix Array index
Apr 15 16:46:07 ..... processing annotations GTF
Apr 15 16:46:19 ..... inserting junctions into the genome indices
Apr 15 16:55:08 ... writing Genome to disk ...
Apr 15 16:55:23 ... writing Suffix Array to disk ...
Apr 15 16:56:00 ... writing SAindex to disk
Apr 15 16:56:08 ..... finished successfully
Creating new reference folder at /home/hanjiangang/single_Cell/example/ref/ovis_aries/ovis_aries
...done
 
Writing genome FASTA file into reference folder...
...done
 
Indexing genome FASTA file...
...done
 
Writing genes GTF file into reference folder...
...done
 
Generating STAR genome index (may take over 8 core hours for a 3Gb genome)...
...done.
 
Writing genome metadata JSON file into reference folder...
Computing hash of genome FASTA file...
...done
 
Computing hash of genes GTF file...
...done
 
...done
 
>>> Reference successfully created! <<<
You can now specify this reference on the command line:
cellranger --transcriptome=/home/hanjiangang/single_Cell/example/ref/ovis_aries/ovis_aries ..

The folder –transcriptome=/home/hmsw/Rattus will be generated in the end

Go directly to the 10x standard process

/home/hwsw/cellranger-7.1.0/cellranger count --id=SRR19145616 \
--transcriptome=/home/hwsw/Rattus \
--fastqs=/home/scRNA/SRR19145616 \
--sample=SRR19145616 \
--localcores=30 \
--localmem=300 \
--nosecondary