A Discrete-Event Network Simulator
API
Loading...
Searching...
No Matches
csv-reader.h
Go to the documentation of this file.
1/*
2 * Copyright (c) 2019 Lawrence Livermore National Laboratory
3 *
4 * SPDX-License-Identifier: GPL-2.0-only
5 *
6 * Author: Mathew Bielejeski <bielejeski1@llnl.gov>
7 */
8
9#ifndef NS3_CSV_READER_H_
10#define NS3_CSV_READER_H_
11
12#include <cstddef>
13#include <cstdint>
14#include <fstream>
15#include <istream>
16#include <string>
17#include <vector>
18
19/**
20 * \file
21 * \ingroup csvreader
22 *
23 * ns3::CsvReader declaration
24 *
25 */
26namespace ns3
27{
28
29/**
30 * \ingroup core
31 * \defgroup csvreader CSV File Reader
32 *
33 * A way to extract data from simple csv files.
34 */
35
36/**
37 * \ingroup csvreader
38 *
39 * Provides functions for parsing and extracting data from
40 * Comma Separated Value (CSV) formatted text files.
41 * This parser is somewhat more relaxed than \RFC{4180};
42 * see below for a list of the differences.
43 * In particular it is possible to set the delimiting character at construction,
44 * enabling parsing of tab-delimited streams or other formats with delimiters.
45 *
46 * \note Excel may generate "CSV" files with either ',' or ';' delimiter
47 * depending on the locale: if ',' is the decimal mark then ';' is the list
48 * separator and used to read/write "CSV" files.
49 *
50 * To use this facility, construct a CsvReader from either a file path
51 * or \c std::istream, then FetchNextRow(), and finally GetValue()
52 * to extract specific values from the row.
53 *
54 * For example:
55 * \code
56 * CsvReader csv (filePath);
57 * while (csv.FetchNextRow ())
58 * {
59 * // Ignore blank lines
60 * if (csv.IsBlankRow ())
61 * {
62 * continue;
63 * }
64 *
65 * // Expecting three values
66 * double x, y, z;
67 * bool ok = csv.GetValue (0, x);
68 * ok |= csv.GetValue (1, y);
69 * ok |= csv.GetValue (2, z);
70 * if (!ok)
71 * {
72 * // Handle error, then
73 * continue;
74 * }
75 *
76 * // Do something with values
77 *
78 * } // while FetchNextRow
79 * \endcode
80 *
81 * As another example, supposing we need a vector from each row,
82 * the middle of the previous example would become:
83 * \code
84 * std::vector<double> v (n);
85 * bool ok = true;
86 * for (std::size_t i = 0; i < v.size (); ++i)
87 * {
88 * ok |= csv.GetValue (i, v[i]);
89 * }
90 * if (!ok) ...
91 * \endcode
92 *
93 *
94 * File Format
95 * ===========
96 *
97 * This parser implements \RFC{4180}, but with several restrictions removed;
98 * see below for differences. All the formatting features described next
99 * are illustrated in the examples which which follow.
100 *
101 * Comments
102 * --------
103 *
104 * The hash character (#) is used to indicate the start of a comment. Comments
105 * are not parsed by the reader. Comments are treated as either an empty column
106 * or part of an existing column depending on where the comment is located.
107 * Comments that are found at the end of a line containing data are ignored.
108 *
109 * 1,2 # This comment ignored, leaving two data columns
110 *
111 * Lines that contain a comment and no data are treated as rows with a single
112 * empty column, meaning that ColumnCount will return 1 and
113 * GetValue() will return an empty string.
114 *
115 * # This row treated as a single empty column, returning an empty string.
116 * "" # So is this
117 *
118 * IsBlankRow() will return \c true in either of these cases.
119 *
120 * Quoted Columns
121 * --------------
122 *
123 * Columns with string data which contain the delimiter character or
124 * the hash character can be wrapped in double quotes to prevent CsvReader
125 * from treating them as special characters.
126 *
127 * 3,string without delimiter,"String with comma ',' delimiter"
128 *
129 * Double quotes can be escaped
130 * by doubling up the quotes inside a quoted field. See example 6 below for
131 * a demonstration.
132 *
133 * Whitespace
134 * ----------
135 *
136 * Leading and trailing whitespace are ignored by the reader and are not
137 * stored in the column data.
138 *
139 * 4,5 , 6 # Columns contain '4', '5', '6'
140 *
141 * If leading or trailing whitespace are important
142 * for a column, wrap the column in double quotes as discussed above.
143 *
144 * 7,"8 "," 9" # Columns contain '7', '8 ', ' 9'
145 *
146 * Trailing Delimiter
147 * ------------------
148 *
149 * Trailing delimiters are ignored; they do _not_ result in an empty column.
150 *
151 *
152 * Differences from RFC 4180
153 * -------------------------
154 * Section 2.1
155 * - Line break can be LF or CRLF
156 *
157 * Section 2.3
158 * - Non-parsed lines are allowed anywhere, not just as a header.
159 * - Lines do not all have to contain the same number fields.
160 *
161 * Section 2.4
162 * - Characters other than comma can be used to separate fields.
163 * - Lines do not all have to contain the same number fields.
164 * - Leading/trailing spaces are stripped from the field
165 * unless the whitespace is wrapped in double quotes.
166 * - A trailing delimiter on a line is not an error.
167 *
168 * Section 2.6
169 * - Quoted fields cannot contain line breaks
170 *
171 * Examples
172 * --------
173 * \par Example 1: Basic
174 * \code
175 * # Column 1: Product
176 * # Column 2: Price
177 * widget, 12.5
178 * \endcode
179 *
180 * \par Example 2: Comment at end of line
181 * \code
182 * # Column 1: Product
183 * # Column 2: Price
184 * broken widget, 12.5 # this widget is broken
185 * \endcode
186 *
187 * \par Example 3: Delimiter in double quotes
188 * \code
189 * # Column 1: Product
190 * # Column 2: Price
191 * # Column 3: Count
192 * # Column 4: Date
193 * widget, 12.5, 100, "November 6, 2018"
194 * \endcode
195 *
196 * \par # Example 4: Hash character in double quotes
197 * \code
198 * # Column 1: Key
199 * # Column 2: Value
200 * # Column 3: Description
201 * count, 5, "# of widgets currently in stock"
202 * \endcode
203 *
204 * \par Example 5: Extra whitespace
205 * \code
206 * # Column 1: Key
207 * # Column 2: Value
208 * # Column 3: Description
209 * count , 5 ,"# of widgets in stock"
210 * \endcode
211 *
212 * \par Example 6: Escaped quotes
213 * \code
214 * # Column 1: Key
215 * # Column 2: Description
216 * # The value returned for Column 2 will be: String with "embedded" quotes
217 * foo, "String with ""embedded"" quotes"
218 * \endcode
219 */
221{
222 public:
223 /**
224 * Constructor
225 *
226 * Opens the file specified in the filepath argument and
227 * reads data from it.
228 *
229 * \param filepath Path to a file containing CSV data.
230 * \param delimiter Character used to separate fields in the data file.
231 */
232 CsvReader(const std::string& filepath, char delimiter = ',');
233
234 /**
235 * Constructor
236 *
237 * Reads csv data from the supplied input stream.
238 *
239 * \param stream Input stream containing csv data.
240 * \param delimiter Character used to separate fields in the data stream.
241 */
242 CsvReader(std::istream& stream, char delimiter = ',');
243
244 /**
245 * Destructor
246 */
247 virtual ~CsvReader();
248
249 /**
250 * Returns the number of columns in the csv data.
251 *
252 * \return Number of columns
253 */
254 std::size_t ColumnCount() const;
255
256 /**
257 * The number of lines that have been read.
258 *
259 * \return The number of lines that have been read.
260 */
261 std::size_t RowNumber() const;
262
263 /**
264 * Returns the delimiter character specified during object construction.
265 *
266 * \return Character used as the column separator.
267 */
268 char Delimiter() const;
269
270 /**
271 * Reads one line from the input until a new line is encountered.
272 * The read data is stored in a cache which is accessed by the
273 * GetValue functions to extract fields from the data.
274 *
275 * \return \c true if a line was read successfully or \c false if the
276 * read failed or reached the end of the file.
277 */
278 bool FetchNextRow();
279
280 /**
281 * Attempt to convert from the string data in the specified column
282 * to the specified data type.
283 *
284 * \tparam T The data type of the output variable.
285 *
286 * \param [in] columnIndex Index of the column to fetch.
287 * \param [out] value Location where the converted data will be stored.
288 *
289 * \return \c true if the specified column has data and the data
290 * was converted to the specified data type.
291 */
292 template <class T>
293 bool GetValue(std::size_t columnIndex, T& value) const;
294
295 /**
296 * Check if the current row is blank.
297 * A blank row can consist of any combination of
298 *
299 * - Whitespace
300 * - Comment
301 * - Quoted empty string `""`
302 *
303 * \returns \c true if the input row is a blank line.
304 */
305 bool IsBlankRow() const;
306
307 private:
308 /**
309 * Attempt to convert from the string data stored at the specified column
310 * index into the specified type.
311 *
312 * \param input [in] String value to be converted.
313 * \param value [out] Location where the converted value will be stored.
314 *
315 * \return \c true if the column exists and the conversion succeeded,
316 * \c false otherwise.
317 */
318 /** @{ */
319 bool GetValueAs(std::string input, double& value) const;
320
321 bool GetValueAs(std::string input, float& value) const;
322
323 bool GetValueAs(std::string input, signed char& value) const;
324
325 bool GetValueAs(std::string input, short& value) const;
326
327 bool GetValueAs(std::string input, int& value) const;
328
329 bool GetValueAs(std::string input, long& value) const;
330
331 bool GetValueAs(std::string input, long long& value) const;
332
333 bool GetValueAs(std::string input, std::string& value) const;
334
335 bool GetValueAs(std::string input, unsigned char& value) const;
336
337 bool GetValueAs(std::string input, unsigned short& value) const;
338
339 bool GetValueAs(std::string input, unsigned int& value) const;
340
341 bool GetValueAs(std::string input, unsigned long& value) const;
342
343 bool GetValueAs(std::string input, unsigned long long& value) const;
344 /** @} */
345
346 /**
347 * Returns \c true if the supplied character matches the delimiter.
348 *
349 * \param c Character to check.
350 * \return \c true if \pname{c} is the delimiter character,
351 * \c false otherwise.
352 */
353 bool IsDelimiter(char c) const;
354
355 /**
356 * Scans the string and splits it into individual columns based on the delimiter.
357 *
358 * \param [in] line String containing delimiter separated data.
359 */
360 void ParseLine(const std::string& line);
361
362 /**
363 * Extracts the data for one column in a csv row.
364 *
365 * \param begin Iterator to the first character in the row.
366 * \param end Iterator to the last character in the row.
367 * \return A tuple containing the content of the column and an iterator
368 * pointing to the position in the row where the column ended.
369 */
370 std::tuple<std::string, std::string::const_iterator> ParseColumn(
371 std::string::const_iterator begin,
372 std::string::const_iterator end);
373
374 /**
375 * Container of CSV data. Each entry represents one field in a row
376 * of data. The fields are stored in the same order that they are
377 * encountered in the CSV data.
378 */
379 typedef std::vector<std::string> Columns;
380
381 char m_delimiter; //!< Character used to separate fields.
382 std::size_t m_rowsRead; //!< Number of lines processed.
383 Columns m_columns; //!< Fields extracted from the current line.
384 bool m_blankRow; //!< Line contains no data (blank line or comment only).
385 std::ifstream m_fileStream; //!< File stream containing the data.
386
387 /**
388 * Pointer to the input stream containing the data.
389 */
390 std::istream* m_stream;
391
392}; // class CsvReader
393
394/****************************************************
395 * Template implementations.
396 ***************************************************/
397
398template <class T>
399bool
400CsvReader::GetValue(std::size_t columnIndex, T& value) const
401{
402 if (columnIndex >= ColumnCount())
403 {
404 return false;
405 }
406
407 std::string cell = m_columns[columnIndex];
408
409 return GetValueAs(std::move(cell), value);
410}
411
412} // namespace ns3
413
414#endif // NS3_CSV_READER_H_
Provides functions for parsing and extracting data from Comma Separated Value (CSV) formatted text fi...
Definition csv-reader.h:221
virtual ~CsvReader()
Destructor.
Definition csv-reader.cc:81
bool GetValue(std::size_t columnIndex, T &value) const
Attempt to convert from the string data in the specified column to the specified data type.
Definition csv-reader.h:400
std::size_t RowNumber() const
The number of lines that have been read.
Definition csv-reader.cc:94
char Delimiter() const
Returns the delimiter character specified during object construction.
std::istream * m_stream
Pointer to the input stream containing the data.
Definition csv-reader.h:390
bool IsDelimiter(char c) const
Returns true if the supplied character matches the delimiter.
CsvReader(const std::string &filepath, char delimiter=',')
Constructor.
Definition csv-reader.cc:63
void ParseLine(const std::string &line)
Scans the string and splits it into individual columns based on the delimiter.
std::size_t ColumnCount() const
Returns the number of columns in the csv data.
Definition csv-reader.cc:86
std::size_t m_rowsRead
Number of lines processed.
Definition csv-reader.h:382
std::ifstream m_fileStream
File stream containing the data.
Definition csv-reader.h:385
bool m_blankRow
Line contains no data (blank line or comment only).
Definition csv-reader.h:384
bool FetchNextRow()
Reads one line from the input until a new line is encountered.
std::vector< std::string > Columns
Container of CSV data.
Definition csv-reader.h:379
bool IsBlankRow() const
Check if the current row is blank.
Columns m_columns
Fields extracted from the current line.
Definition csv-reader.h:383
bool GetValueAs(std::string input, double &value) const
Attempt to convert from the string data stored at the specified column index into the specified type.
char m_delimiter
Character used to separate fields.
Definition csv-reader.h:381
std::tuple< std::string, std::string::const_iterator > ParseColumn(std::string::const_iterator begin, std::string::const_iterator end)
Extracts the data for one column in a csv row.
Every class exported by the ns3 library is enclosed in the ns3 namespace.