1

I have a 2D array (similar to table in MySQL), for instance,

+------------------+------------------+-------------------+------------------+
| trip_id          | service_id       | route_id          | shape_id         |
+------------------+------------------+-------------------+------------------+
| 4503599630773892 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773894 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773896 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773898 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773900 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630773902 | 4503599630773892 | 11821949021891677 | 4503599630773892 |
| 4503599630810392 | 4503599630773892 | 11821949021891678 | 4503599630810392 |
| 4503599630810394 | 4503599630810394 | 11821949021891678 | 4503599630810392 |
| 4503599630810396 | 4503599630773892 | 11821949021891678 | 4503599630810392 |
| 4503599630810398 | 4503599630773892 | 11821949021891678 | 4503599630810392 |
+------------------+------------------+-------------------+------------------+

How can I store a 2D array in python, like table in MySQL?

The first solution came to my mind is to use dict. The key is trip_id (the first column) and the value is list ([service_id, route_id, shape_id]).

Another solution is to use SQLite.

Which one is recommended, or other solutions?

PS: I want to store rows (say,[trip_id, service_id, route_id, shape_id]) that are crawled from webpages. It requires dozens of insert or append operation. The order of entries is not necessary, but should be unique.

4
  • 2
    Depending on what you want to do with it, I would look at pandas (but for simple things, it can be overkill) Commented Jul 16, 2015 at 15:43
  • I would say that if you know you will only want to access 1 column then a dict (e.g. trip_id) will be faster however if there is a need to search from any of the column you are better off using SQLite / Pandas Commented Jul 16, 2015 at 15:45
  • @joris I want to store rows (say,[trip_id, service_id, route_id, shape_id]) that are crawled from webpages. Commented Jul 16, 2015 at 15:46
  • Do you require massive speed? Is speed mostly an issue at construction or query time? How are you going to query the data (pick row, pick column, pick single element, ...)? Some solutions will be better than others depending on your requirements. Commented Jul 16, 2015 at 15:52

2 Answers 2

1

Depending on your use cases, the usual solution of using a list of lists (or list of tuples) may be the most efficient and readable alternative:

my_table= [
    ("trip_id", "service_id", "route_id", "shape_id"),
    (4503599630773892, 4503599630773892, 11821949021891677, 4503599630773892),
    ...
    ] 
first_trip_id= my_table[1][0]

You can then simply add new rows using my_table.append( (1,2,3,4) ), which is as efficient as it gets (in python).


There's a couple of tricks you can use to make accesses to this kind of structure efficient and readable.

You can opt to exclude the header from it, which may help you make sense of the indexes. If you want both versions, just store the header in it, make a copy, and pop the headers:

from copy import deepcopy
my_table_no_headers= deepcopy(my_table)
my_table_no_headers.pop(0)
first_trip_id= my_table_no_headers[0][0]

Something that may also help writing readable code is declaring constants for the column names:

trip_id, service_id, route_id, shape_id= range(4)
first_trip_id= my_table_no_headers[0][trip_id]

If you want to obtain, for example, a list of all trip_ids, you can simply invert the indexing order:

my_table_no_headers = zip(*my_table_no_headers)
first_trip_id= my_table_no_headers[trip_id][0]
all_trip_ids= my_table_no_headers[trip_id] #note this is not a copy!

This is the same as transposing a matrix. Note this last indexing order is not suitable for adding rows.

Sign up to request clarification or add additional context in comments.

Comments

1

You would have to be more specific on your requirements to say objectively whether or not you should use an sqlite db. (Although I would tend to lean towards yes, if you will be storing more than one instance of this kind of data).

You should be aware, however, that unless you are using OrderedDict that the order of your objects would be random (And not accessible by index). A dict by default does not preserve the item order.

I would actually suggest you make your table a list of objects rather than a dict of columns where you would need to look up matching values across lists.

trips = [
  {
    "trip_id": "4503599630773892",
    "service_id": "4503599630773892",
    "route_id": "4503599630773892",
    "shape_id": "4503599630773892"
  },
  {
    "trip_id": "4503599630773892",
    "service_id": "4503599630773892",
    "route_id": "4503599630773892",
    "shape_id": "4503599630773892"
  }
]

etc.

The reason being that lookup would be much easier, using filter() or just a for loop. The equivalent process for the structure you have now would involve filtering a single column, finding matching values by index, then basically compiling this data structure yourself every time anyway (And you would have to worry about maintaining order very strictly and avoiding mismatching column lengths).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.