0

I have one xml file like this:

<dbReference type="PM" id="17224074"/>
<dbReference type="DOI" id="10.1186/bcr1637"/>
</citation>
<scope>VARIANTS ILE-282 AND ASN-777</scope>
</reference>
<comment type="function">
<text evidence="24">Calcium.</text>
</comment>
<comment type="function">
<text evidence="24">Has a strong inhibitory effect on APP C99 and C83 production.</text>
</comment>
<comment type="subunit">
<text evidence="5 13">Homodimer; disulfide-linked.</text>
</comment>
<comment type="interaction">
<interactant intactId="EBI-727477"/>
<interactant intactId="EBI-7644904">
<id>Q9JIY2</id>
<label>Cbll1</label>
</interactant>
<organismsDiffer>true</organismsDiffer>
<experiments>21</experiments>
</comment>

I want to extract only the information in

<comment type="function">...</comment>

In this example being: 'Calcium.' AND 'as a strong inhibitory effect on APP C99 and C83 production.'

I have this table where I want to save the data:

CREATE TABLE IF NOT EXISTS INFORMATION (id varchar(255) NOT NULL, name varchar(255), entry varchar(255), comment longtext, PRIMARY KEY (id)); 

where I would save 'Calcium.' AND 'as a strong inhibitory effect on APP C99 and C83 production.' in column named 'comment'. I thought that I could insert this info directly from the xml to the table with LOAD XML but my xml file has too many different fields. How do I do this? Do I have to first extract the data from the xml and then insert into the table?

2 Answers 2

2

One option that can be useful:

File: '/path/to/xml/file/xmlfile.xml'

<dbReference type="PM" id="17224074"/>
<dbReference type="DOI" id="10.1186/bcr1637"/>
</citation>
<scope>VARIANTS ILE-282 AND ASN-777</scope>
</reference>
<comment type="function">
    <text evidence="24">Calcium.</text>
</comment>
<comment type="function">
    <text evidence="24">Has a strong inhibitory effect on APP C99 and C83 production.</text>
</comment>
<comment type="subunit">
    <text evidence="5 13">Homodimer; disulfide-linked.</text>
</comment>
<comment type="interaction">
    <interactant intactId="EBI-727477"/>
    <interactant intactId="EBI-7644904">
        <id>Q9JIY2</id>
        <label>Cbll1</label>
    </interactant>
    <organismsDiffer>true</organismsDiffer>
    <experiments>21</experiments>
</comment>

MySQL Command-line:

mysql> DROP TEMPORARY TABLE IF EXISTS `temp_information`;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> DROP TABLE IF EXISTS `information`;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> CREATE TABLE IF NOT EXISTS `information` (
    -> `id` VARCHAR(255) NOT NULL,
    -> `name` VARCHAR(255),
    -> `entry` VARCHAR(255),
    -> `comment` LONGTEXT,
    -> PRIMARY KEY (`id`)
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TEMPORARY TABLE IF NOT EXISTS `temp_information` (
    -> `id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    -> `type` VARCHAR(20),
    -> `text` TEXT,
    -> `evidence` VARCHAR(20)
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD XML INFILE '/path/to/xml/file/xmlfile.xml'
    -> INTO TABLE `temp_information`
    -> ROWS IDENTIFIED BY '<comment>';
Query OK, 4 rows affected (0.00 sec)
Records: 4  Deleted: 0  Skipped: 0  Warnings: 0

mysql> INSERT INTO `information` (`id`, `comment`)
    -> SELECT UUID(), GROUP_CONCAT(`text` ORDER BY `id` SEPARATOR ' ')
    -> FROM
    -> `temp_information`
    -> WHERE
    -> `type` = 'function'
    -> GROUP BY `evidence`;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> SELECT
    -> `id`,
    -> `name`,
    -> `entry`,
    -> `comment`
    -> FROM
    -> `information`;
+--------------------------------------+------+-------+------------------------------------------------------------------------+
| id                                   | name | entry | comment                                                                |
+--------------------------------------+------+-------+------------------------------------------------------------------------+
| e720d259-fcde-11e5-be3f-a4badbf9ce21 | NULL | NULL  | Calcium. Has a strong inhibitory effect on APP C99 and C83 production. |
+--------------------------------------+------+-------+------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> DROP TEMPORARY TABLE IF EXISTS `temp_information`;
Query OK, 0 rows affected (0.00 sec)

Check:

Sign up to request clarification or add additional context in comments.

Comments

1

use an XML parser (such as SAXParser) to parse the file first, then iterate over nodes, looking for comment nodes. For each one that has a type of "function", put the node text into the DB.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.