How to Upload File to Mongodb Cmd

No affair what you lot're building with MongoDB, at some point you'll want to import some information. Whether it's the majority of your data, or but some reference data that you want to integrate with your main data gear up, you'll find yourself with a bunch of JSON or CSV files that you need to import into a collection. Fortunately, MongoDB provides a tool chosen mongoimport which is designed for this chore. This guide will explicate how to effectively use mongoimport to become your data into your MongoDB database.

We also provide MongoImport Reference documentation, if yous're looking for something comprehensive or y'all simply need to await upward a command-line option.

#Prerequisites

This guide assumes that y'all're reasonably comfy with the command-line. Most of the guide will but be running commands, but towards the stop I'll show how to pipe information through some command-line tools, such as jq.

If you lot haven't had much experience on the control-line (also sometimes called the terminal, or shell, or bash), why not follow along with some of the examples? It'southward a bang-up fashion to go started.

The examples shown were all written on MacOS, but should run on any unix-type system. If y'all're running on Windows, I recommend running the case commands inside the Windows Subsystem for Linux.

You'll need a temporary MongoDB database to exam out these commands. If you're just getting started, I recommend you sign upwardly for a gratuitous MongoDB Atlas account, and and so nosotros'll take care of the cluster for yous!

And of course, you'll need a copy of mongoimport. If yous accept MongoDB installed on your workstation then you may already have mongoimport installed. If non, follow these instructions on the MongoDB website to install it.

I've created a GitHub repo of sample data, containing an extract from the New York Citibike dataset in different formats that should be useful for trying out the commands in this guide.

#Getting Started with mongoimport

mongoimport is a powerful command-line tool for importing data from JSON, CSV, and TSV files into MongoDB collections. It's super-fast and multi-threaded, so in many cases will exist faster than whatsoever custom script you might write to do the aforementioned thing. mongoimport apply can exist combined with some other command-line tools, such every bit jq for JSON manipulation, or csvkit for CSV manipulation, or even gyre for dynamically downloading data files from servers on the net. As with many command-line tools, the options are endless!

#Choosing a Source Data Format

In many ways, having your source data in JSON files is better than CSV (and TSV). JSON is both a hierarchical information format, like MongoDB documents, and is also explicit about the types of data information technology encodes. On the other hand, source JSON data can be difficult to deal with - in many cases it is not in the construction you'd like, or it has numeric data encoded as strings, or mayhap the date formats are non in a form that mongoimport accepts.

CSV (and TSV) data is tabular, and each row will be imported into MongoDB as a separate document. This ways that these formats cannot support hierarchical data in the same way equally a MongoDB certificate can. When importing CSV information into MongoDB, mongoimport volition attempt to make sensible choices when identifying the type of a specific field, such as int32 or string. This behaviour can be overridden with the use of some flags, and you can specify types if you want to. On top of that, mongoimport supplies some facilities for parsing dates and other types in different formats.

In many cases, the selection of source data format won't be up to you lot - information technology'll exist up to the organisation generating the data and providing information technology to you. I recommend if the source data is in CSV course and so y'all shouldn't attempt to convert it to JSON starting time unless y'all programme to restructure it.

#Connect mongoimport to Your Database

This section assumes that y'all're connecting to a relatively straightforward setup - with a default authentication database and some authentication set upward. (You should ever create some users for authentication!)

If you lot don't provide whatsoever connection details to mongoimport, it will attempt to connect to MongoDB on your local machine, on port 27017 (which is MongoDB's default). This is the same as providing --host=localhost:27017.

#One URI to Rule Them All

There are several options that allow you to provide separate connection data to mongoimport, but I recommend yous use the --uri option. If you're using Atlas you can get the advisable connection URI from the Atlas interface, by clicking on your cluster'southward "Connect" button and selecting "Connect your Application". (Atlas is existence continuously adult, and so these instructions may be slightly out of date.) Ready the URI as the value of your --uri option, and replace the username and password with the appropriate values:

                          
1 mongoimport --uri 'mongodb+srv://MYUSERNAME:SECRETPASSWORD@mycluster-ABCDE.azure.mongodb.net/test?retryWrites=true&w=majority'

Be aware that in this form the username and password must be URL-encoded. If you lot don't want to worry most this, then provide the username and countersign using the --username and --password options instead:

                          
1 mongoimport --uri 'mongodb+srv://mycluster-ABCDE.azure.mongodb.net/test?retryWrites=truthful&west=majority' \
two --username='MYUSERNAME' \
3 --password='SECRETPASSWORD'

If yous omit a countersign from the URI and exercise not provide a --password selection, then mongoimport volition prompt y'all for a password on the control-line. In all these cases, using single-quotes around values, as I've done, will salve you issues in the long-run!

If yous're not connecting to an Atlas database, and then you'll accept to generate your own URI. If you're connecting to a single server (i.e. you don't have a replicaset), then your URI will look similar this: mongodb://your.server.host.name:port/. If yous're running a replicaset (and you lot should!) then you lot have more one hostname to connect to, and you don't know in advance which is the primary. In this case, your URI volition consist of a series of servers in your cluster (y'all don't need to provide all of your cluster'south servers, providing i of them is available), and mongoimport will discover and connect to the primary automatically. A replicaset URI looks like this: mongodb://username:password@host1:port,host2:port/?replicaSet=replicasetname.

Full details of the supported URI formats tin can be establish in our reference documentation.

At that place are besides many other options available and these are documented in the mongoimport reference documentation.

In one case you've determined the URI, then the fun begins. In the remainder of this guide, I'll leave those flags out. You lot'll demand to add them in when trying out the various other options.

#Import Ane JSON Document

The simplest manner to import a single file into MongoDB is to use the --file selection to specify a file. In my opinion, the very best state of affairs is that you take a directory full of JSON files which need to be imported. Ideally each JSON file contains one document you wish to import into MongoDB, it's in the correct structure, and each of the values is of the correct type. Utilise this option when you wish to import a single file as a single document into a MongoDB collection.

You'll find data in this format in the 'file_per_document' directory in the sample data GitHub repo. Each document will look like this:

                          
1 {
ii "tripduration": 602,
3 "starttime": "2019-12-01 00:00:05.5640",
4 "stoptime": "2019-12-01 00:ten:07.8180",
v "starting time station id": 3382,
6 "starting time station name": "Carroll St & Smith St",
7 "outset station latitude": xl.680611,
viii "outset station longitude": -73.99475825,
9 "end station id": 3304,
x "end station proper noun": "vi Ave & 9 St",
11 "end station breadth": 40.668127,
12 "end station longitude": -73.98377641,
thirteen "bikeid": 41932,
14 "usertype": "Subscriber",
15 "birth twelvemonth": 1970,
xvi "gender": "male person"
17 }
                          
1 mongoimport --collection='mycollectionname' --file='file_per_document/ride_00001.json'

The command above will import all of the json file into a drove mycollectionname. You don't take to create the collection in advance.

The imported document, viewed in MongoDB Compass
The imported document, viewed in MongoDB Compass

If you use MongoDB Compass or another tool to connect to the drove you lot merely created, y'all'll see that MongoDB likewise generated an _id value in each document for you. This is because MongoDB requires every document to accept a unique _id, but yous didn't provide one. I'll encompass more than on this shortly.

#Import Many JSON Documents

Mongoimport will only import one file at a fourth dimension with the --file option, but yous can become effectually this by piping multiple JSON documents into mongoimport from some other tool, such equally cat. This is faster than importing 1 file at a time, running mongoimport from a loop, as mongoimport itself is multithreaded for faster uploads of multiple documents. With a directory full of JSON files, where each JSON file should exist imported as a dissever MongoDB document tin can be imported by cd-ing to the directory that contains the JSON files and running:

                          
i cat *.json | mongoimport --collection='mycollectionname'

Equally before, MongoDB creates a new _id for each document inserted into the MongoDB collection, because they're not contained in the source information.

#Import One Large JSON Array

Sometimes y'all volition have multiple documents contained in a JSON array in a single document, a fiddling like the following:

                          
ane [
2 { title : "Document ane", data : "document 1 value"} ,
three { title : "Document 2", data : "document two value"}
4 ]

You lot can import information in this format using the --file pick, using the --jsonArray option:

                          
1 mongoimport --collection='from_array_file' --file='one_big_list.json' --jsonArray

If you forget to add the --jsonArray option, mongoimport will fail with the mistake "cannot decode array into a Document." This is because documents are equivalent to JSON objects, not arrays. You tin shop an array as a _value_ on a certificate, but a document cannot exist an array.

#Import MongoDB-specific Types with JSON

If yous import some of the JSON data from the sample data github repo and then view the collection'south schema in Compass, you may find a couple of issues:

  • The values of starttime and stoptime should be "date" types, not "string".

  • MongoDB supports geographical points, only doesn't recognize the offset and stop stations' latitudes and longitudes as such.

This stems from a fundamental difference between MongoDB documents and JSON documents. Although MongoDB documents often look similar JSON data, they're not. MongoDB stores data as BSON. BSON has multiple advantages over JSON. It's more compact, it's faster to traverse, and it supports more types than JSON. Among those types are Dates, GeoJSON types, binary information, and decimal numbers. All the types are listed in the MongoDB documentation

If y'all want MongoDB to recognise fields being imported from JSON every bit specific BSON types, those fields must exist manipulated and then that they follow a structure nosotros call Extended JSON. This ways that the following field:

                          
1 "starttime": "2019-12-01 00:00:05.5640"

must be provided to MongoDB as:

                          
i "starttime": {
two "$date": "2019-12-01T00:00:05.5640Z"
3 }

for it to be recognized as a Date type. Note that the format of the engagement string has changed slightly, with the 'T' separating the date and time, and the Z at the end, indicating UTC timezone.

Similarly, the latitude and longitude must be converted to a GeoJSON Point type if you wish to take advantage of MongoDB'southward ability to search location data. The ii values:

                          
1 "kickoff station latitude": 40.680611,
ii "start station longitude": -73.99475825,

must exist provided to mongoimport in the following GeoJSON Point form:

                          
i "start station location": {
2 "type": "Point",
three "coordinates": [ -73.99475825, 40.680611 ]
four }

Note: the pair of values are longitude and so latitude, equally this sometimes catches people out!

Once you lot accept geospatial data in your drove, y'all tin can apply MongoDB's geospatial queries to search for data by location.

If y'all need to transform your JSON data in this kind of way, meet the department on JQ.

#Importing Data Into Non-Empty Collections

When importing data into a collection which already contains documents, your _id value is important. If your incoming documents don't contain _id values, then new values will be created and assigned to the new documents equally they are added to the collection. If your incoming documents practice comprise _id values, and so they will exist checked against existing documents in the drove. The _id value must be unique within a collection. By default, if the incoming document has an _id value that already exists in the collection, so the document will be rejected and an error will exist logged. This mode (the default) is called "insert mode". There are other modes, nevertheless, that acquit differently when a matching certificate is imported using mongoimport.

#Update Existing Records

If you are periodically supplied with new data files you can use mongoimport to efficiently update the data in your collection. If your input data is supplied with a stable identifier, use that field every bit the _id field, and supply the option --mode=upsert. This mode will insert a new certificate if the _id value is not currently nowadays in the drove. If the _id value already exists in a certificate, and then that document will be overwritten by the new document data.

If you lot're upserting records that don't have stable IDs, you can specify some fields to use to match against documents in the drove, with the --upsertFields choice. If you're using more than one field proper noun, split these values with a comma:

                          
ane --upsertFields=name,accost,acme

Think to index these fields, if you're using --upsertFields, otherwise it'll exist dull!

#Merge Data into Existing Records

If you lot are supplied with data files which extend your existing documents past adding new fields, or update certain fields, you can use mongoimport with "merge way". If your input data is supplied with a stable identifier, use that field as the _id field, and supply the choice --manner=merge. This manner volition insert a new document if the _id value is not currently nowadays in the collection. If the _id value already exists in a document, then that document will be overwritten past the new document data.

Y'all tin also use the --upsertFields option here every bit well equally when yous're doing upserts, to match the documents you lot want to update.

#Import CSV (or TSV) into a Collection

If you have CSV files (or TSV files - they're conceptually the same) to import, utilize the --type=csv or --blazon=tsv option to tell mongoimport what format to expect. Likewise of import is to know whether your CSV file has a header row - where the beginning line doesn't contain data - instead it contains the name for each column. If you do have a header row, you should employ the --headerline option to tell mongoimport that the offset line should not be imported as a document.

With CSV data, you lot may take to do some extra work to comment the data to get it to import correctly. The primary bug are:

  • CSV data is "flat" - in that location is no practiced style to embed sub-documents in a row of a CSV file, then you may want to restructure the information to match the structure you lot wish to take in your MongoDB documents.

  • CSV data does not include type information.

The first problem is a probably bigger upshot. Yous have two options. 1 is to write a script to restructure the data before using mongoimport to import the data. Another approach could exist to import the data into MongoDB and then run an assemblage pipeline to transform the information into your required structure.

Both of these approaches are out of the scope of this weblog post. If it's something you'd like to come across more explanation of, head over to the MongoDB Community Forums.

The fact that CSV files don't specify the blazon of data in each field tin be solved by specifying the field types when calling mongoimport.

#Specify Field Types

If you don't have a header row, then you must tell mongoimport the name of each of your columns, so that mongoimport knows what to call each of the fields in each of the documents to be imported. At that place are two methods to practice this: You lot can listing the field names on the command-line with the --fields option, or you lot tin can put the field names in a file, and signal to it with the --fieldFile selection.

                          
1 mongoimport \
two --collection='fields_option' \
3 --file=without_header_row.csv \
4 --type=csv \
5 --fields="tripduration","starttime","stoptime","starting time station id","start station name","start station latitude","start station longitude","end station id","end station name","stop station latitude","end station longitude","bikeid","usertype","nativity year","gender"

That's quite a long line! In cases where at that place are lots of columns information technology's a expert thought to manage the field names in a field file.

#Utilise a Field File

A field file is a list of column names, with one name per line. And so the equivalent of the --fields value from the phone call above looks like this:

                          
1 tripduration
2 starttime
3 stoptime
4 start station id
v offset station proper name
6 outset station latitude
seven start station longitude
8 stop station id
nine stop station name
10 end station latitude
11 terminate station longitude
12 bikeid
13 usertype
fourteen nascence yr
15 gender

If you lot put that content in a file called 'field_file.txt' and then run the following command, it volition use these cavalcade names as field names in MongoDB:

                          
1 mongoimport \
2 --collection='fieldfile_option' \
3 --file=without_header_row.csv \
4 --type=csv \
five --fieldFile=field_file.txt
The imported document, viewed in MongoDB Compass.  Note that the date fields have been imported as strings.
The imported certificate, viewed in MongoDB Compass. Note that the date fields have been imported as strings.

If y'all open Compass and wait at the schema for either 'fields_option' or 'fieldfile_option', you should meet that mongoimport has automatically converted integer types to int32 and kept the breadth and longitude values as double which is a existent type, or floating-point number. In some cases, though, MongoDB may brand an incorrect conclusion. In the screenshot to a higher place, you can see that the 'starttime' and 'stoptime' fields have been imported as strings. Ideally they would take been imported equally a BSON date type, which is more efficient for storage and filtering.

In this case, you'll desire to specify the type of some or all of your columns.

#Specify Types for CSV Columns

All of the types y'all can specify are listed in our reference documentation

To tell mongoimport you wish to specify the type of some or all of your fields, you should apply the --columnsHaveTypes selection. Every bit well as using the --columnsHaveTypes option, you will need to specify the types of your fields. If you're using the --fields option, y'all can add together type information to that value, but I highly recommend adding type data to the field file. This way it should exist more readable and maintainable, and that'south what I'll demonstrate here.

I've created a file chosen field_file_with_types.txt, and entered the following:

                          
1 tripduration.car()
ii starttime.date(2006-01-02 15:04:05)
3 stoptime.date(2006-01-02 15:04:05)
4 start station id.auto()
5 start station name.auto()
6 commencement station latitude.auto()
seven start station longitude.car()
8 end station id.auto()
nine end station name.car()
10 end station latitude.car()
11 end station longitude.machine()
12 bikeid.auto()
13 usertype.motorcar()
fourteen nascency year.machine()
15 gender.auto()

Because mongoimport already did the right affair with near of the fields, I've set up them to auto() - the type information comes after a period (.). The ii time fields, starttime and stoptime were being incorrectly imported as strings, then in these cases I've specified that they should be treated as a engagement type. Many of the types take arguments inside the parentheses. In the case of the date type, information technology expects the argument to be a date formatted in the same way you expect the column's values to exist formatted. See the reference documentation for more details.

At present, the data can be imported with the post-obit call to mongoimport:

                          
1 mongoimport --collection='with_types' \
2 --file=without_header_row.csv \
3 --type=csv \
iv --columnsHaveTypes \
v --fieldFile=field_file_with_types.txt

#And The Rest

Hopefully y'all now have a good idea of how to utilise mongoimport and of how flexible information technology is! I haven't covered well-nigh all of the options that tin be provided to mongoimport, however, just the most of import ones. Others I find useful oftentimes are:

Option Clarification
--ignoreBlanks Ignore fields or columns with empty values.
--driblet Drop the collection before importing the new documents. This is specially useful during development, but will lose data if y'all use it accidentally.
--stopOnError Another pick that is useful during evolution, this causes mongoimport to end immediately when an error occurs.

There are many more than! Check out the mongoimport reference documentation for all the details.

One of the major benefits of command-line programs is that they are designed to work with other control-line programs to provide more power. In that location are a couple of command-line programs that I especially recommend y'all wait at: jq a JSON manipulation tool, and csvkit a similar tool for working with CSV files.

#JQ

JQ is a processor for JSON data. It incorporates a powerful filtering and scripting linguistic communication for filtering, manipulating, and fifty-fifty generating JSON information. A full tutorial on how to apply JQ is out of scope for this guide, but to give yous a cursory taster:

If you create a JQ script chosen fix_dates.jq containing the following:

                          
1 .starttime |= { "$appointment": (. | sub(" "; "T") + "Z") }
two | .stoptime |= { "$date": (. | sub(" "; "T") + "Z") }

Y'all can now pipe the sample JSON information through this script to change the starttime and stoptime fields then that they volition exist imported into MongoDB every bit Date types:

                          
ane repeat ' { "tripduration": 602, "starttime": "2019-12-01 00:00:05.5640", "stoptime": "2019-12-01 00:x:07.8180" }' \
ii | jq -f fix_dates.jq
three {
4 "tripduration": 602,
v "starttime": {
6 "$engagement": "2019-12-01T00:00:05.5640Z"
7 },
8 "stoptime": {
9 "$date": "2019-12-01T00:x:07.8180Z"
x }
11 }

This can be used in a multi-stage piping, where data is piped into mongoimport via jq.

The jq tool can be a lilliputian fiddly to empathize at first, but once you get-go to understand how the language works, information technology is very powerful, and very fast. I've provided a more circuitous JQ script example in the sample data GitHub repo, called json_fixes.jq. Bank check it out for more ideas, and the full documentation on the JQ website.

#CSVKit

In the same way that jq is a tool for filtering and manipulating JSON information, csvkit is a pocket-sized collection of tools for filtering and manipulating CSV data. Some of the tools, while useful in their own right, are unlikely to be useful when combined with mongoimport. Tools like csvgrep which filters csv file rows based on expressions, and csvcut which can remove whole columns from CSV input, are useful tools for slicing and dicing your information before providing information technology to mongoimport.

Check out the csvkit docs for more information on how to use this collection of tools.

#Other Tools

Are there other tools y'all know of which would work well with mongoimport? Do you lot have a great example of using awk to handle tabular data before importing into MongoDB? Let u.s. know on the community forums!

#Conclusion

It'south a mutual error to write custom code to import data into MongoDB. I promise I've demonstrated how powerful mongoimport is as a tool for importing data into MongoDB quickly and efficiently. Combined with other elementary command-line tools, it'due south both a fast and flexible way to import your data into MongoDB.

  • MongoDB

nguyenkhorde.blogspot.com

Source: https://www.mongodb.com/developer/how-to/mongoimport-guide/

0 Response to "How to Upload File to Mongodb Cmd"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel