To get files from your file system into a SharePoint document library you have a couple of options available. The following guide covers how to synchronise files using an excel spreadsheet into a SharePoint Document Library.
We take the idea that you have a spreadsheet or CSV file containing the file names, paths to each file and any additional metadata. This could be user generated or created as an export from another internal system. In brief you just connect to your spreadsheet, connect to SharePoint, map the columns, compare and sync.
We do have other options available to sync files to SharePoint with some of these available as blog posts, mentioned at the end of the page, and the rest in our documentation.
The example shown below uploads only a small sample of documents, however you can use this method to bulk upload thousands of documents.
Before getting started you will need to ensure that you have the following:
In your Excel spreadsheet you need to specify the name of the document and any metadata you want to be uploaded alongside it. In our example we have a range of ten documents with different modified dates and different approval status’.
If your documents are contained within folders in your directory, you want to include the sub-directory folder name with the file name for example Folder1\doc1.docx
. This will ensure that the document can be found, and your directory structure is replicated in the Document Library. An example spreadsheet can be seen here:
To get started open Data Sync and click onto Connect Datasource link in the source (data source A) window. This will open the connection window where you can connect to your spreadsheet.
To do this find Excel and expand the tree so that you can select the OpenXML Excel Spreadsheet provider from the list.
Under the FileName field click onto the ellipsis to open the file explorer and locate your spreadsheet containing your metadata. Then just click onto the Connect button to connect.
You will now need to define a couple of connection properties to find the files mentioned in the spreadsheet. In the connection properties window, below the source columns, go to the section Settings.Writer and complete the following fields:
C:\Users\Rebecca\Documents\Demo Documents
.Now you need to connect to your SharePoint Document Library as the Target. Click onto the Connect Datasource link in the Target (datasource B) window and go to the SharePoint section.
Then select SharePoint Online for connecting to a SharePoint Online site or SharePoint ClientAPI to connect to an On-Premise SharePoint site. More details on connecting to SharePoint On-Prem can be found here.
As we will need to write to the Modified column in SharePoint we need to use the Custom Azure App OAuth method to connect. Alternatively if legacy authentication is still enabled on your site and you are an admin in SharePoint, you can connect using the legacy authentication method. Another alternative is to create the application registration in SharePoint, we have our full documentation on how to connect using ACS here.
You need to enter in the URL to the SharePoint site that the Document Library is located. In this example the Document Library is found in a subsite called Test, so our URL will look similar to: https://company.sharepoint.com/Test
.
https://company.sharepoint.com
https://company.sharepoint.com/subsite/subsite2
Follow the details from the documentation to connect to SharePoint using your preferred method, and click Authorize Connection to connect.
Now you can select your Document Library from the dropdown list and set the delete behaviour.
We recommend saving the connection to the connection library so that future projects can be quickly set up, to do this click onto Connect and Create Library Connection and enter in a name for the connection. This will add a connection to your SharePoint site/sub-site in the connection library so that you can quickly access all the lists and libraries in the future. You only need to add the connection to each site/subsite once, as you can connect to any of the lists or libraries within that site from the Connection Library window.
The next step is to map your source columns to your target columns. The mapping in your project may look slightly different if you use different column names.
At a minimum you need to map a source column to the URLPath column in SharePoint. In this example we map the FileName to the URLPath and set this to be the key column.
The key column must be unique and be able to distinguish each document from each other. As there can not be duplicate filenames this works for the key column.
We're also adding a Title, an approved status and a modified date to each document. You can see the full schema mapping for this example below:
If you have specified the folder structure that the file is found in, in the FileName column this will be handled by Data Sync and the relevant folder will be created in SharePoint if it does not already exist.
As of Data Sync Version 3.0.1276 you can map a different filename to the one you specified in the BlobFileName field earlier. Additionally the column specified in the BlobFileName property on the source does not need to be included in the schema map.
This means that if you want your files to go to different folders or have a different name, you just need to map another filename column (such as TargetFileName in the example spreadsheet) to the URL Path in the schema map.
This would make the schema map look similar to this:
Once the schema map has been configured and a key column set you can run the comparison and preview the results. To do this click onto the Compare A -> B button in the toolbar.
This will return the differences between the source and target data. Data Sync will present the additions, updates and deletes which you can preview by clicking onto each option.
Please note that deletes are disabled by default. To enable them set EnableDelete on the target to True.
For this example we have 10 records to add to the document library, and we can make sure that the data is appearing as we would expect.
Once the comparison results look as expected, we can run the sync to synchronise the changes to SharePoint.
To do this click Synchronise and then Start to begin the sync.
To make sure everything was successful you can run the compare again, you should have 0 results, or you can browse to the Document Library in SharePoint.
Make sure to save your Data Sync project so that you can use it again in the future.
From this example we can see all the documents added with their corresponding metadata.
If you defined folders in your paths and your folders didn't already exist then they will be created in SharePoint and the relevant files added. If the folders already existed then the files will simply be added to the matching folder.
From the example data where the files were listed inside folders, the SharePoint Document Library now reflects this:
And clicking inside a folder will show the documents defined to be in that folder.
If you haven't already, make sure to save your Data Sync project so that you can schedule it to run on a regular basis or use it again when you need to.
The Run Tool is an additional program that comes linked to Data Sync and enables you to build out your data integration jobs.
You can use this to group Data Sync projects that need to run in order and add additional step types such as adding an status report to email you when the project has run and if there were any failures. Each step is run in the order it appears in the list and you can apply conditional rules so that the next step can run if the previous one succeeded or had data changes for example. To find out more please see our Run Tool Documentation.
To open your project in the Run Tool, in your Data Sync project go to Tools >Open in Run Tool.
You can then click onto the green Run button to check it runs as expected. If there are data changes you will be provided with a count of how many items were added updated or deleted
Another option is to Run the Run Tool project or single Data Sync project from the command line. To do this pass the path to the project file like this:
%DATASYNCINSTALLPATH%\Simego.DataSync.Run.exe -execute "D:\DSProjects\myproject.dsrun"
%DATASYNCINSTALLPATH%\Simego.DataSync.Run.exe -execute "D:\DSProjects\myproject.dsprj"
You can then schedule your Run Tool project to run when you need it to using Windows Task Scheduler. For the full details on how to do this see our Task Scheduler Documentation.
An alternative option, with more scheduling capabilities is to use Ouvvi Automation Server. Ouvvi enables you to fully schedule and manage all of your Data Integration Projects, from Data Sync projects to SQL Statements to Powershell Scripts.
It provides full logging and documentation capabilities to fully manage your integration operations.
To find out more see our Ouvvi Documentation or send us an email.