How can I store a 2D array in python, like table in MySQL?

Question

I have a 2D array (similar to table in MySQL), for instance,

+------------------+------------------+-------------------+------------------+
| trip_id          | service_id       | route_id          | shape_id         |
+------------------+------------------+-------------------+------------------+
| 4503599630773892 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773894 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773896 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773898 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773900 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773902 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630810392 | 4503599630773892 | 11821949021891678 | 4503599630810392 |
| 4503599630810394 | 4503599630810394 | 11821949021891678 | 4503599630810392 |
| 4503599630810396 | 4503599630773892 | 11821949021891678 | 4503599630810392 |
| 4503599630810398 | 4503599630773892 | 11821949021891678 | 4503599630810392 |
+------------------+------------------+-------------------+------------------+

How can I store a 2D array in python, like table in MySQL?

The first solution came to my mind is to use dict. The key is trip_id (the first column) and the value is list ([service_id, route_id, shape_id]).

Another solution is to use SQLite.

Which one is recommended, or other solutions?

PS: I want to store rows (say,[trip_id, service_id, route_id, shape_id]) that are crawled from webpages. It requires dozens of insert or append operation. The order of entries is not necessary, but should be unique.

Depending on what you want to do with it, I would look at pandas (but for simple things, it can be overkill) — joris
– joris, Commented Jul 16, 2015 at 15:43
I would say that if you know you will only want to access 1 column then a dict (e.g. trip_id) will be faster however if there is a need to search from any of the column you are better off using SQLite / Pandas — nico
– nico, Commented Jul 16, 2015 at 15:45
@joris I want to store rows (say,[trip_id, service_id, route_id, shape_id]) that are crawled from webpages. — SparkAndShine
– SparkAndShine, Commented Jul 16, 2015 at 15:46
Do you require massive speed? Is speed mostly an issue at construction or query time? How are you going to query the data (pick row, pick column, pick single element, ...)? Some solutions will be better than others depending on your requirements. — Hannes Ovrén
– Hannes Ovrén, Commented Jul 16, 2015 at 15:52

loopbackbee · Accepted Answer · 2015-07-16 16:15:04Z

Depending on your use cases, the usual solution of using a list of lists (or list of tuples) may be the most efficient and readable alternative:

my_table= [
    ("trip_id", "service_id", "route_id", "shape_id"),
    (4503599630773892, 4503599630773892, 11821949021891677, 4503599630773892),
    ...
    ] 
first_trip_id= my_table[1][0]

You can then simply add new rows using my_table.append( (1,2,3,4) ), which is as efficient as it gets (in python).

There's a couple of tricks you can use to make accesses to this kind of structure efficient and readable.

You can opt to exclude the header from it, which may help you make sense of the indexes. If you want both versions, just store the header in it, make a copy, and pop the headers:

from copy import deepcopy
my_table_no_headers= deepcopy(my_table)
my_table_no_headers.pop(0)
first_trip_id= my_table_no_headers[0][0]

Something that may also help writing readable code is declaring constants for the column names:

trip_id, service_id, route_id, shape_id= range(4)
first_trip_id= my_table_no_headers[0][trip_id]

If you want to obtain, for example, a list of all trip_ids, you can simply invert the indexing order:

my_table_no_headers = zip(*my_table_no_headers)
first_trip_id= my_table_no_headers[trip_id][0]
all_trip_ids= my_table_no_headers[trip_id] #note this is not a copy!

This is the same as transposing a matrix. Note this last indexing order is not suitable for adding rows.

Noi Sek · Accepted Answer · 2015-07-16 15:55:06Z

You would have to be more specific on your requirements to say objectively whether or not you should use an sqlite db. (Although I would tend to lean towards yes, if you will be storing more than one instance of this kind of data).

You should be aware, however, that unless you are using OrderedDict that the order of your objects would be random (And not accessible by index). A dict by default does not preserve the item order.

I would actually suggest you make your table a list of objects rather than a dict of columns where you would need to look up matching values across lists.

trips = [
  {
    "trip_id": "4503599630773892",
    "service_id": "4503599630773892",
    "route_id": "4503599630773892",
    "shape_id": "4503599630773892"
  },
  {
    "trip_id": "4503599630773892",
    "service_id": "4503599630773892",
    "route_id": "4503599630773892",
    "shape_id": "4503599630773892"
  }
]

etc.

The reason being that lookup would be much easier, using filter() or just a for loop. The equivalent process for the structure you have now would involve filtering a single column, finding matching values by index, then basically compiling this data structure yourself every time anyway (And you would have to worry about maintaining order very strictly and avoiding mismatching column lengths).

Collectives™ on Stack Overflow

How can I store a 2D array in python, like table in MySQL?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related