first step download SRR data
#This is a batch download nohup prefetch -X 100GB --option-file SRR_Acc_List.txt & nohup fastq-dump --gzip --split-files -A ./SRR13633760 -O /home/scRNA/ & amp;
next Build a custom reference using Cell Ranger mkref
First, find the reference genome FASTA and GTF files for your species. If the species is available from the Ensembl database, we recommend using the files there. GTF files from Ensembl contain optional tags to make filtering easy. If the species you are interested in is not available with Ensembl, GTF and FASTA files from other sources can also be used. Note that GTF files are required and GFF files are not supported. (See GFF/GTF file formats – definitions and supported options)
This is a link to Ensembl for selecting species to create reference genome files
Here is a demonstration using rats
Open the contents in the red box in the picture respectively.
Click on the content in the red box
Select the top FASTA file for download
The download of the first file has been completed here. The download of the second file is shown below. You need to open the box below and select Download GTF
Select this download
Now we have completed downloading the content
The next step is to construct the reference genome
I understand that this step is not necessary. When doing some analysis, you can exclude genes based on this, for example, only add –attribute=gene_biotype:protein_coding to only analyze the encoded proteins in all cells
Look at the description
(base) hwsw@shpc-2596-instance-GkVAxmvG:~$ $cellranger mkgtf Usage: mkgtf <input_gtf> <output_gtf> [--attribute=KEY:VALUE...] mkgtf -h | --help | --version (base) hwsw@shpc-2596-instance-GkVAxmvG:~$ $cellranger mkgtf -h Genes GTF tool for 10x Genomics Cell Ranger. Filter user-supplied GTF files for use as Cell Ranger-compatible genes files for mkref tool. The commands below should be preceded by 'cellranger': Usage: mkgtf <input_gtf> <output_gtf> [--attribute=KEY:VALUE...] mkgtf -h | --help | --version Arguments: input_gtf Path to input genes GTF file. output_gtf Path to filtered output genes GTF file. Options: --attribute=<key:value> Key-value pair in attributes field to be kept in the GTF file. -h --help Show this message. --version Show version.
This is to select the desired phenotype and then filter it
#Filter GTF cellranger=/home/hwsw/cellranger-7.1.0/cellranger $cellranger mkgtf \ Rattus_norvegicus.mRatBN7.2.105.gtf Rattus_norvegicus.mRatBN7.2.105.filtered.gtf \ --attribute=gene_biotype:protein_coding \ --attribute=gene_biotype:lncRNA \ --attribute=gene_biotype:antisense \ --attribute=gene_biotype:IG_LV_gene \ --attribute=gene_biotype:IG_V_gene \ --attribute=gene_biotype:IG_V_pseudogene \ --attribute=gene_biotype:IG_D_gene \ --attribute=gene_biotype:IG_J_gene \ --attribute=gene_biotype:IG_J_pseudogene \ --attribute=gene_biotype:IG_C_gene \ --attribute=gene_biotype:IG_C_pseudogene \ --attribute=gene_biotype:TR_V_gene \ --attribute=gene_biotype:TR_V_pseudogene \ --attribute=gene_biotype:TR_D_gene \ --attribute=gene_biotype:TR_J_gene \ --attribute=gene_biotype:TR_J_pseudogene \ --attribute=gene_biotype:TR_C_gene
Prepare reference genome for single cell analysis
cellranger mkref –help View command parameters
Reference preparation tool for 10x Genomics Cell Ranger. Build a Cell Ranger-compatible reference folder from user-supplied genome FASTA and gene GTF files. Creates a new folder named after the genome. The commands below should be preceded by 'cellranger': Usage: mkref --genome=NAME ... --fasta=PATH ... --genes=PATH ... [options] mkref -h | --help | --version Arguments: genome #Output folder Unique genome name(s), used to name output folder [a-zA-Z0-9_-] + . Specify multiple genomes by specifying the --genome argument multiple times; the output folder will be <name1>_and_<name2>. fasta #FASTA reference genome absolute path Path(s) to FASTA file containing your genome reference. Specify multiple genomes by specifying the --fasta argument multiple times. genes #.filtered.gtf annotation file absolute path Path(s) to genes GTF file(S) containing annotated genes for your genome reference. Specify multiple genomes by specifying the --genes argument multiple times. Options: --nthreads=<num> This option is currently ignored due to a bug, and will be re-enabled in the next Cell Ranger release. --memgb=<num> Maximum memory (GB) used when aligning reads with STAR. Defaults to 16. --ref-version=<str> Optional reference version string to include with reference. -h --help Show this message. --version Show version.
The analysis takes a long time and takes 2-3 hours
#Run mkref cellranger mkref \ --genome=mRatBN7 \ --fasta=Rattus_norvegicus.mRatBN7.2.dna.toplevel.fa \ --genes=Rattus_norvegicus.mRatBN7.2.110.filtered.gtf \ --ref-version=1.0.0
I took screenshots of other people’s pictures and running them
Apr 15 14:36:45 ..... started STAR run Apr 15 14:36:45 ... starting to generate Genome files Apr 15 14:38:52 ... starting to sort Suffix Array. This may take a long time... Apr 15 14:39:03 ... sorting Suffix Array chunks and saving them to disk... Apr 15 16:40:45 ... loading chunks from disk, packing SA... Apr 15 16:41:47 ... finished generating suffix array Apr 15 16:41:47 ... generating Suffix Array index Apr 15 16:46:07 ... completed Suffix Array index Apr 15 16:46:07 ..... processing annotations GTF Apr 15 16:46:19 ..... inserting junctions into the genome indices Apr 15 16:55:08 ... writing Genome to disk ... Apr 15 16:55:23 ... writing Suffix Array to disk ... Apr 15 16:56:00 ... writing SAindex to disk Apr 15 16:56:08 ..... finished successfully Creating new reference folder at /home/hanjiangang/single_Cell/example/ref/ovis_aries/ovis_aries ...done Writing genome FASTA file into reference folder... ...done Indexing genome FASTA file... ...done Writing genes GTF file into reference folder... ...done Generating STAR genome index (may take over 8 core hours for a 3Gb genome)... ...done. Writing genome metadata JSON file into reference folder... Computing hash of genome FASTA file... ...done Computing hash of genes GTF file... ...done ...done >>> Reference successfully created! <<< You can now specify this reference on the command line: cellranger --transcriptome=/home/hanjiangang/single_Cell/example/ref/ovis_aries/ovis_aries ..
The folder –transcriptome=/home/hmsw/Rattus will be generated in the end
Go directly to the 10x standard process
/home/hwsw/cellranger-7.1.0/cellranger count --id=SRR19145616 \ --transcriptome=/home/hwsw/Rattus \ --fastqs=/home/scRNA/SRR19145616 \ --sample=SRR19145616 \ --localcores=30 \ --localmem=300 \ --nosecondary