Parse a text file into multiple text file

Posted by Vijay Kumar Singh on Stack Overflow See other posts from Stack Overflow or by Vijay Kumar Singh
Published on 2012-07-10T20:36:15Z Indexed on 2012/07/10 21:15 UTC
Read the original article Hit count: 299

Filed under:

I want to get multiple file by parsing a input file Through Java. The Input file contains many fasta format of thousands of protein sequence and I want to generate raw format(i.e., without any comma semicolon and without any extra symbol like ">", "[", "]" etc) of each protein sequence.

A fasta sequence starts form ">" symbol followed by description of protein and then sequence of protein.

For example ? >lcl|NC_000001.10_cdsid_XP_003403591.1 [gene=LOC100652771] [protein=hypothetical protein LOC100652771] [protein_id=XP_003403591.1] [location=join(12190..12227,12595..12721,13403..13639)] MSESINFSHNLGQLLSPPRCVVMPGMPFPSIRSPELQKTTADLDHTLVSVPSVAESLHHPEITFLTAFCL PSFTRSRPLPDRQLHHCLALCPSFALPAGDGVCHGPGLQGSCYKGETQESVESRVLPGPRHRH

Like above formate the input file contains 1000s of protein sequence. I have to generate thousands of raw file containing only individual protein sequence without any special symbol or gaps.

I have developed the code for it in Java but out put is : Cannot open a file followed by cannot find file.

Please help me to solve my problem.

Regards Vijay Kumar Garg Varanasi Bharat (India)

The code is

/*Java code to convert FASTA format to a raw format*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.io.FileInputStream;

// java package for using regular expression
public class Arrayren
{
    public static void main(String args[]) throws IOException  
    {
        String a[]=new String[1000];
        String b[][] =new String[1000][1000];
        /*open the id file*/
        try
        {
            File f = new File ("input.txt"); 
            //opening the text document containing genbank ids
            FileInputStream fis = new FileInputStream("input.txt");
            //Reading the file contents through inputstream
            BufferedInputStream bis = new BufferedInputStream(fis);
            // Writing the contents to a buffered stream
            DataInputStream dis = new DataInputStream(bis);
            //Method for reading Java Standard data types
            String inputline;
            String line;
            String separator = System.getProperty("line.separator");
            // reads a line till next line operator is found
            int i=0;
            while ((inputline=dis.readLine()) != null) 
            {
                i++;
                a[i]=inputline;
                a[i]=a[i].replaceAll(separator,"");
                //replaces unwanted patterns like /n with space
                a[i]=a[i].trim();
                // trims out if any space is available
                a[i]=a[i]+".txt";
                //takes the file name into an array
                try
                // to handle run time error
                /*take the sequence in to an array*/
                {
                    BufferedReader in = new BufferedReader (new FileReader(a[i]));
                    String inline = null;
                    int j=0;
                    while((inline=in.readLine()) != null)
                    {
                        j++;
                        b[i][j]=inline;
                        Pattern q=Pattern.compile(">");
                        //Compiling the regular expression
                        Matcher n=q.matcher(inline);
                        //creates the matcher for the above pattern
                        if(n.find())
                        {
                            /*appending the comment line*/
                            b[i][j]=b[i][j].replaceAll(">gi","");
                            //identify the pattern and replace it with a space
                            b[i][j]=b[i][j].replaceAll("[a-zA-Z]","");
                            b[i][j]=b[i][j].replaceAll("|","");
                            b[i][j]=b[i][j].replaceAll("\\d{1,15}","");
                            b[i][j]=b[i][j].replaceAll(".","");
                            b[i][j]=b[i][j].replaceAll("_","");
                            b[i][j]=b[i][j].replaceAll("\\(","");
                            b[i][j]=b[i][j].replaceAll("\\)","");
                        }
                        /*printing the sequence in to a text file*/
                        b[i][j]=b[i][j].replaceAll(separator,"");
                        b[i][j]=b[i][j].trim();
                        // trims out if any space is available
                        File create = new File(inputline+"R.txt");
                        try
                        {
                            if(!create.exists())
                            {
                                create.createNewFile();
                                // creates a new file
                            }
                            else
                            {
                                System.out.println("file already exists");
                            }
                        }
                        catch(IOException e)
                        // to catch the exception and print the error if cannot open a file
                        {
                            System.err.println("cannot create a file");
                        }
                        BufferedWriter outt = new BufferedWriter(new FileWriter(inputline+"R.txt", true));
                        outt.write(b[i][j]);
                        // printing the contents to a text file
                        outt.close();
                        // closing the text file
                        System.out.println(b[i][j]);
                    }
                }
                catch(Exception e)
                {
                    System.out.println("cannot open a file");
                }
            }
        }
        catch(Exception ex)
        // catch the exception and prints the error if cannot find file
        {
            System.out.println("cannot find file ");
        }
    }
}

If you provide me correct it will be much easier to understand.

© Stack Overflow or respective owner

Related posts about java